1:1 odds in 2026 on human-expert MMLU performance, $1B models, >90% MATH , >80% APPS top-1, IMO Gold Medal, or human-like robot dexterity is a short timeline.
I disagree. I think each of these benchmarks will be surpassed well before we are at AGI-levels of capability. That said, I agree that the post was insufficient in justifying why we think this bet is a reasonable reply to the OP. I hope in the near-term future to write a longer, more personal post that expands on some of my reasoning.
The bet itself was merely a public statement to the effect of “if people are saying these radical things, why don’t they put their money where their mouths are?” I don’t think such statements need to have long arguments attached to them. But, I can totally see why people were left confused.
I appreciate that you changed the title, and think this makes the post a lot more agreeable. It is totally reasonable to be making bets without having to justify them, just as long as the making of a bet is not mistaken to be more evidence than its associated sustained market movement.
I think each of these benchmarks will be surpassed well before we are at AGI-levels of capability.
Solving any of these tasks in a non-gamed manner just 14 years after AlexNet might not be at the point of AGI, or at least I can envision a future consistent with it coming prior, but it is significant evidence that AGI is not too many years out. I can still just about imagine today that neural networks might hit some wall that ultimately limits their understanding, but this point has to come prior to neural networks showing that they are almost fully general reasoners with the right backpropagation signal (it is after all the backpropagation that is capable of learning almost arbitrary tasks with almost no task-specialization). An alarm needs to precede alignment catastrophe by long enough that you have time to do something about it; isn’t much use if it is only there to tell you how you are going to die.
Bootstrapping is often painted as a model looking at its own code, thinking really hard, and writing better code that it knows to be better, but this is an extremely strong version of bootstrapping and you don’t need to come anywhere close to these capabilities in order to start worrying about concrete dangers. I wrote a post that gave an example of a minimum viable FOOM, but it is not only possible to get to from that angle, nor the earliest level of capability where I think things will start breaking. It is worth remembering that evolution optimized for humanity from proto-humans that could not be given IMO Gold Medal questions and be expected to solve them. Evolution isn’t intelligent at all, so it certainly is not the case that you need human level intelligence before you can optimize on intelligence.
You may PM me for a small optional point I don’t want to make in public.
-
E: A more aggressive way of phrasing this is to challenge, why can’t this work, and concretely, what specific capability do you think is missing for machine intelligence to start doing transformative things like AI research? If a machine is grounded in language and other sensory media, and is also capable of taking a novel mathematical question and inventing not just the answer but the method by which it is solved, why can’t it apply that ability to other tasks that it is able to talk about? Many models have shown that reasoning abilities transfer, and that agents trained on general domains do similarly well across them.
I disagree. I think each of these benchmarks will be surpassed well before we are at AGI-levels of capability. That said, I agree that the post was insufficient in justifying why we think this bet is a reasonable reply to the OP. I hope in the near-term future to write a longer, more personal post that expands on some of my reasoning.
The bet itself was merely a public statement to the effect of “if people are saying these radical things, why don’t they put their money where their mouths are?” I don’t think such statements need to have long arguments attached to them. But, I can totally see why people were left confused.
I appreciate that you changed the title, and think this makes the post a lot more agreeable. It is totally reasonable to be making bets without having to justify them, just as long as the making of a bet is not mistaken to be more evidence than its associated sustained market movement.
Solving any of these tasks in a non-gamed manner just 14 years after AlexNet might not be at the point of AGI, or at least I can envision a future consistent with it coming prior, but it is significant evidence that AGI is not too many years out. I can still just about imagine today that neural networks might hit some wall that ultimately limits their understanding, but this point has to come prior to neural networks showing that they are almost fully general reasoners with the right backpropagation signal (it is after all the backpropagation that is capable of learning almost arbitrary tasks with almost no task-specialization). An alarm needs to precede alignment catastrophe by long enough that you have time to do something about it; isn’t much use if it is only there to tell you how you are going to die.
Bootstrapping is often painted as a model looking at its own code, thinking really hard, and writing better code that it knows to be better, but this is an extremely strong version of bootstrapping and you don’t need to come anywhere close to these capabilities in order to start worrying about concrete dangers. I wrote a post that gave an example of a minimum viable FOOM, but it is not only possible to get to from that angle, nor the earliest level of capability where I think things will start breaking. It is worth remembering that evolution optimized for humanity from proto-humans that could not be given IMO Gold Medal questions and be expected to solve them. Evolution isn’t intelligent at all, so it certainly is not the case that you need human level intelligence before you can optimize on intelligence.
You may PM me for a small optional point I don’t want to make in public.
-
E: A more aggressive way of phrasing this is to challenge, why can’t this work, and concretely, what specific capability do you think is missing for machine intelligence to start doing transformative things like AI research? If a machine is grounded in language and other sensory media, and is also capable of taking a novel mathematical question and inventing not just the answer but the method by which it is solved, why can’t it apply that ability to other tasks that it is able to talk about? Many models have shown that reasoning abilities transfer, and that agents trained on general domains do similarly well across them.