I often refer to the ideas in this post and think the fundamental point is quite important: structural advantages in quantity, cost, and speed might make AI systems quite useful and thus impactful prior to being broadly superhuman.
(The exact estimates in the post do pretty strongly assume the current rough architecture, scaling laws, and paradigm, so discount accordingly.)
There are now better estimates of many of the relevant quantities done by various people (maybe Epoch, Daniel Kokotajlo, Eli Lifland), but I’m not aware of another updated article which makes the full argument made here.
The things this post seems to most miss in retrospect:
It seems that spending more inference compute can (sometimes) be used to qualitatively and quantitatively improve capabilities (e.g., o1, recent swe-bench results, arc-agi rather than merely doing more work in parallel. Thus, it’s not clear that the relevant regime will look like “lots of mediocre thinking”.[1]
Inference speeds have actually gone up a bunch not down despite models getting better. (100 tok/s is common for frontier models at the time of writing.) This might be related to models getting smaller. It’s not clear this post made a prediction here exactly, but it is an interesting way the picture differs.
Using specialized hardware (and probably much more cost per token), it is possible to get much faster inference speeds (e.g. 1k tok / s) on frontier modes like llama 405b. I expect this will continue to be possible and a potentially important dynamic will be paying extra to run LLMs very fast on specialized inference hardware.
I continue to think better estimates of the questions raised in this post are important and hope that additional work like this will be out soon.
That said, in practice, methods now often are just doing BoN over whole trajectories which is pretty similar in some sense to lots of mediocre thinking.
I often refer to the ideas in this post and think the fundamental point is quite important: structural advantages in quantity, cost, and speed might make AI systems quite useful and thus impactful prior to being broadly superhuman.
(The exact estimates in the post do pretty strongly assume the current rough architecture, scaling laws, and paradigm, so discount accordingly.)
There are now better estimates of many of the relevant quantities done by various people (maybe Epoch, Daniel Kokotajlo, Eli Lifland), but I’m not aware of another updated article which makes the full argument made here.
The things this post seems to most miss in retrospect:
It seems that spending more inference compute can (sometimes) be used to qualitatively and quantitatively improve capabilities (e.g., o1, recent swe-bench results, arc-agi rather than merely doing more work in parallel. Thus, it’s not clear that the relevant regime will look like “lots of mediocre thinking”.[1]
Inference speeds have actually gone up a bunch not down despite models getting better. (100 tok/s is common for frontier models at the time of writing.) This might be related to models getting smaller. It’s not clear this post made a prediction here exactly, but it is an interesting way the picture differs.
Using specialized hardware (and probably much more cost per token), it is possible to get much faster inference speeds (e.g. 1k tok / s) on frontier modes like llama 405b. I expect this will continue to be possible and a potentially important dynamic will be paying extra to run LLMs very fast on specialized inference hardware.
I continue to think better estimates of the questions raised in this post are important and hope that additional work like this will be out soon.
That said, in practice, methods now often are just doing BoN over whole trajectories which is pretty similar in some sense to lots of mediocre thinking.