With o1, and now o3, It seems fairly plausible now that there will be a split between “verifiable” capabilities, and general capabilities. Sure, there will be some cross-pollination (transfer), but this might have some limits.
What then? Can a superhuman mathematical + Coding AI also just reason through political strategy, or will it struggle and make errors/fallback on somewhat generic ideas in training data?
Can we get a “seed-AI style” consequentialist in some domains, while it fails to perform above human level in others? I’d like to believe reasoning would transfer (as it should be universal), but I dont think reasoning is sufficient for some more fuzzy domains—the model also needs good heuristics.
The AI Control agenda seems more promising now (for both this reason and some others).
I’d say control seems easier now, but it’s unclear if this makes the agenda more promising. You might have thought one issue with the agenda is that control is likely to be trivial and thus not worth working on (and that some other problem, e.g., doing alignment research with AI labor regardless of whether the AIs are scheming is a bigger issue).
I was fairly on board with control before, I think my main remaining concern is the trusted models not being good enough. But with more elaborate control protocols (Assuming political/AI labs actually make an effort to implement), catching an escape attempt seems more likely if the model’s performance is very skewed to specific domains.
Though yeah I agree that some of what you mentioned might not have changed, and could still be an issue
We haven’t yet seen what happens when they turn to the verifiable property of o3 to self-play on a variety of strategy games. I suspect that it will unlock a lot of general reasoning and strategy
Do you think there’s some initial evidence for that? E.g. Voyager or others from Deepmind. Self play gets thrown around a lot, not sure if concretely we’ve seen much yet for LLMs using it.
But yes agree, good point regarding strategy games being a domain that could be verifiable
I don’t think o3 represents much direct progress towards AGI, but I think it does represent indirect progress. Math and code specialist speeds up AI R&D, which then shortens time to AGI.
Does it? ML progress is famously achieved by atheoretical empirical tinkering, i. e. by having a very well-developed intuitive research taste: the exact opposite of well-posed math problems on which o1-3 shine. Something similar seems to be the case with programming: AIs seem bad at architecture/system design.
So it only speeds up the “drudge work”, not the actual load-bearing theoretical work. Which is nonzero speedup, as it allows to test intuitive-theoretical ideas quicker, but it’s more or less isomorphic to having a team of competent-ish intern underlings.
With o1, and now o3, It seems fairly plausible now that there will be a split between “verifiable” capabilities, and general capabilities. Sure, there will be some cross-pollination (transfer), but this might have some limits.
What then? Can a superhuman mathematical + Coding AI also just reason through political strategy, or will it struggle and make errors/fallback on somewhat generic ideas in training data?
Can we get a “seed-AI style” consequentialist in some domains, while it fails to perform above human level in others? I’d like to believe reasoning would transfer (as it should be universal), but I dont think reasoning is sufficient for some more fuzzy domains—the model also needs good heuristics.
The AI Control agenda seems more promising now (for both this reason and some others).
I’d say control seems easier now, but it’s unclear if this makes the agenda more promising. You might have thought one issue with the agenda is that control is likely to be trivial and thus not worth working on (and that some other problem, e.g., doing alignment research with AI labor regardless of whether the AIs are scheming is a bigger issue).
I was fairly on board with control before, I think my main remaining concern is the trusted models not being good enough. But with more elaborate control protocols (Assuming political/AI labs actually make an effort to implement), catching an escape attempt seems more likely if the model’s performance is very skewed to specific domains. Though yeah I agree that some of what you mentioned might not have changed, and could still be an issue
We haven’t yet seen what happens when they turn to the verifiable property of o3 to self-play on a variety of strategy games. I suspect that it will unlock a lot of general reasoning and strategy
Do you think there’s some initial evidence for that? E.g. Voyager or others from Deepmind. Self play gets thrown around a lot, not sure if concretely we’ve seen much yet for LLMs using it.
But yes agree, good point regarding strategy games being a domain that could be verifiable
I don’t think o3 represents much direct progress towards AGI, but I think it does represent indirect progress. Math and code specialist speeds up AI R&D, which then shortens time to AGI.
Does it? ML progress is famously achieved by atheoretical empirical tinkering, i. e. by having a very well-developed intuitive research taste: the exact opposite of well-posed math problems on which o1-3 shine. Something similar seems to be the case with programming: AIs seem bad at architecture/system design.
So it only speeds up the “drudge work”, not the actual load-bearing theoretical work. Which is nonzero speedup, as it allows to test intuitive-theoretical ideas quicker, but it’s more or less isomorphic to having a team of competent-ish intern underlings.
Important questions! Thanks for yhe thoughts. More discussion about this here: https://www.lesswrong.com/posts/oC4wv4nTrs2yrP5hz/what-are-the-strongest-arguments-for-very-short-timelines?commentId=nZsFCqbC943hTeiRC