Matthew, Tamay: Refreshing post, with actual hard data and benchmarks. Thanks for that.
My predictions:
A model/ensemble of models achieves >80% on all tasks in the MMLU benchmark
No in 2026, no in 2030. Mainly due to the fact that we don’t have much structured data and incentives to solve some of the categories. A powerful unsupervised AI would be needed to clear those categories, or more time.
A credible estimate reveals that an AI lab deployed EITHER >10^30 FLOPs OR hardware that would cost $1bn if purchased through competitive cloud computing vendors at the time on a training run to develop a single ML model (excluding autonomous driving efforts)
This may actually happen (the 1B one, not the 10^30), also due to inflation and USD created out of thin air and injected into the market. I would go for no in 2026 and yes in 2030.
A model/ensemble of models will achieve >90% on the MATH dataset using a no-calculator rule
No in 2026, no in 2030. Significant algorithmic improvements needed. It may be done if prompt engineering is allowed.
A model/ensemble of models achieves >80% top-1 strict accuracy on competition-level problems on the APPS benchmark
No in 2026, no in 2030. Similar to the above, but there will be more progress, as a lot of data is available.
A gold medal for the IMO Grand Challenge (conditional on it being clear that the questions were not in the training set)
No in 2026, no in 2030.
A robot that can, from beginning to end, reliably wash dishes, take them out of an ordinary dishwasher and stack them into a cabinet, without breaking any dishes, and at a comparable speed to humans (<120% the average time)
I work with smart robots, this cannot happen so fast also due to hardware limitations. The speed requirement is particularly harsh. Without the speed limit and with the system known in advance I would say yes in 2030. As the bet stands, I go for No in 2026, no in 2030.
Tesla’s full-self-driving capability makes fewer than one major mistake per 100,000 miles
Not sure about this one, but I lean on No in 2026, no in 2030.
A close call, but I would lean still on no. Engineering the prompt is where humans leverage all their common sense and vast (w.r.t.. the AI) knowledge.
Thanks. Yes, pretty much in line with the authors. Btw, I would super happy to be wrong and see advancement in those areas, especially the robotic one.
Thanks for the offer, but I’m not interested in betting money.
Matthew, Tamay: Refreshing post, with actual hard data and benchmarks. Thanks for that.
My predictions:
A model/ensemble of models achieves >80% on all tasks in the MMLU benchmark
No in 2026, no in 2030. Mainly due to the fact that we don’t have much structured data and incentives to solve some of the categories. A powerful unsupervised AI would be needed to clear those categories, or more time.
A credible estimate reveals that an AI lab deployed EITHER >10^30 FLOPs OR hardware that would cost $1bn if purchased through competitive cloud computing vendors at the time on a training run to develop a single ML model (excluding autonomous driving efforts)
This may actually happen (the 1B one, not the 10^30), also due to inflation and USD created out of thin air and injected into the market. I would go for no in 2026 and yes in 2030.
A model/ensemble of models will achieve >90% on the MATH dataset using a no-calculator rule
No in 2026, no in 2030. Significant algorithmic improvements needed. It may be done if prompt engineering is allowed.
A model/ensemble of models achieves >80% top-1 strict accuracy on competition-level problems on the APPS benchmark
No in 2026, no in 2030. Similar to the above, but there will be more progress, as a lot of data is available.
A gold medal for the IMO Grand Challenge (conditional on it being clear that the questions were not in the training set)
No in 2026, no in 2030.
A robot that can, from beginning to end, reliably wash dishes, take them out of an ordinary dishwasher and stack them into a cabinet, without breaking any dishes, and at a comparable speed to humans (<120% the average time)
I work with smart robots, this cannot happen so fast also due to hardware limitations. The speed requirement is particularly harsh. Without the speed limit and with the system known in advance I would say yes in 2030. As the bet stands, I go for No in 2026, no in 2030.
Tesla’s full-self-driving capability makes fewer than one major mistake per 100,000 miles
Not sure about this one, but I lean on No in 2026, no in 2030.
The criteria adjusts for inflation.
How much would your view shift if there was a model that could “engineer its own prompt”, even during training?
A close call, but I would lean still on no. Engineering the prompt is where humans leverage all their common sense and vast (w.r.t.. the AI) knowledge.
Nice specific breakdown! Sounds like you side with the authors overall. Want to also make the 3:1 bet with me?
Thanks. Yes, pretty much in line with the authors. Btw, I would super happy to be wrong and see advancement in those areas, especially the robotic one.
Thanks for the offer, but I’m not interested in betting money.