Eliminating capability overhangs is discovering AI capabilities faster, so also pushes us up the exponential! For example, it took a few years for chain-of-thought prompting to become more widely known than among a small circle of people around AI Dungeon. Once chain-of-thought became publicly known, labs started fine-tuning models to explicitly do chain-of-thought, increasing their capabilities significantly. This gap between niche discovery and public knowledge drastically slowed down progress along the growth curve!
It seems like chain of thought prompting mostly became popular around the time models became smart enough to use it productively, and I really don’t think that AI dungeon was the source of this idea (why do you think that?) I think it quickly became popular around the time of PaLM just because it was a big and smart model. And I really don’t think the people who worked on GSM8k or Minerva were like “This is a great idea from the AI dungeon folks,” I think this is just the thing you try if you want your AI to work on math problems, it had been discussed a lot of times within labs and I think was rolled out fast enough that delays had almost ~0 effect on timelines. Whatever effect they had seems like it must be ~100% from increasing the timeline for scaling up investment.
And even granting the claim about chain of thought, I disagree about where current progress is coming from. What exactly is the significant capability increase from fine-tuning models to do chain of thought? This isn’t part of ChatGPT or Codex or AlphaCode. What exactly is the story?
I really don’t think that AI dungeon was the source of this idea (why do you think that?)
We’ve heard the story from a variety of sources all pointing to AI Dungeon, and to the fact that the idea was kept from spreading for a significant amount of time. This @gwern Reddit comment, and previous ones in the thread, cover the story well.
And even granting the claim about chain of thought, I disagree about where current progress is coming from. What exactly is the significant capability increase from fine-tuning models to do chain of thought? This isn’t part of ChatGPT or Codex or AlphaCode. What exactly is the story?
Regarding the effects of chain of thought prompting on progress[1], there’s two levels of impact: first order effects and second order effects.
On first order, once chain of thought became public a large number of groups started using it explicitly to finetune their models.
Aside from non-public examples, big ones include PaLM, Google’s most powerful model to date. Moreover, it makes models much more useful for internal R&D with just prompting and no finetuning.
We don’t know what OpenAI used for ChatGPT, or future models: if you have some information about that, it would be super useful to hear about it!
On second order: implementing this straightforwardly improved the impressiveness and capabilities of models, making them more obviously powerful to the outside world, more useful for customers, and leading to an increase in attention and investment into the field.
Due to compounding, the earlier these additional investments arrive, the sooner large downstream effects will happen.
It seems like chain of thought prompting mostly became popular around the time models became smart enough to use it productively, and I really don’t think that AI dungeon was the source of this idea (why do you think that?) I think it quickly became popular around the time of PaLM just because it was a big and smart model. And I really don’t think the people who worked on GSM8k or Minerva were like “This is a great idea from the AI dungeon folks,” I think this is just the thing you try if you want your AI to work on math problems, it had been discussed a lot of times within labs and I think was rolled out fast enough that delays had almost ~0 effect on timelines. Whatever effect they had seems like it must be ~100% from increasing the timeline for scaling up investment.
And even granting the claim about chain of thought, I disagree about where current progress is coming from. What exactly is the significant capability increase from fine-tuning models to do chain of thought? This isn’t part of ChatGPT or Codex or AlphaCode. What exactly is the story?
We’ve heard the story from a variety of sources all pointing to AI Dungeon, and to the fact that the idea was kept from spreading for a significant amount of time. This @gwern Reddit comment, and previous ones in the thread, cover the story well.
Regarding the effects of chain of thought prompting on progress[1], there’s two levels of impact: first order effects and second order effects.
On first order, once chain of thought became public a large number of groups started using it explicitly to finetune their models.
Aside from non-public examples, big ones include PaLM, Google’s most powerful model to date. Moreover, it makes models much more useful for internal R&D with just prompting and no finetuning.
We don’t know what OpenAI used for ChatGPT, or future models: if you have some information about that, it would be super useful to hear about it!
On second order: implementing this straightforwardly improved the impressiveness and capabilities of models, making them more obviously powerful to the outside world, more useful for customers, and leading to an increase in attention and investment into the field.
Due to compounding, the earlier these additional investments arrive, the sooner large downstream effects will happen.
This is also partially replying to @Rohin Shah ’s question in another comment: