>As it turns out, the only thing that matters was scale.
I mean, in some sense yes. But AlphaGo wasn’t trained by finding a transcript of every Go game that had ever been played, but instead was trained via self-play RL. But attempts to create general game-playing agents via similar methods haven’t worked out very well, in my understanding. I don’t assume that if we just threw 10x or 100x data at them that this would change...
>The architecture that can play 100 games and does extremely well at game 101 the first try gets way more points than one that doesn’t. The one that has never read a book on the topic of the LSAT but still does well on the exam is exactly what we are looking for.
Yes, but the latter exists and is trained via human reinforcement learning that can’t be translated to self-play. The former doesn’t exist as far as I can tell. I don’t see anyone proposing to improve GPT-4 by turning from HFRL to self-play RL.
Ultimately I think there’s a possibility that the improvements to LLMs from further scaling may not be very large, and instead we’ll need to find some sort of new architecture to create dangerous AGIs.
Gpt-4 did RL feedback that was self evaluation across all the inputs users fed by chatGPT.
Self play would be having it practice leetcode problems with the RL feedback the score.
The software support is there and the RL feedback worked, why do you think it is even evidence to say “obvious thing that works well hasn’t been done yet or maybe it has, openAI won’t say”
There is also a tremendous amount of self play possible now with the new plugin interface.
>As it turns out, the only thing that matters was scale.
I mean, in some sense yes. But AlphaGo wasn’t trained by finding a transcript of every Go game that had ever been played, but instead was trained via self-play RL. But attempts to create general game-playing agents via similar methods haven’t worked out very well, in my understanding. I don’t assume that if we just threw 10x or 100x data at them that this would change...
>The architecture that can play 100 games and does extremely well at game 101 the first try gets way more points than one that doesn’t. The one that has never read a book on the topic of the LSAT but still does well on the exam is exactly what we are looking for.
Yes, but the latter exists and is trained via human reinforcement learning that can’t be translated to self-play. The former doesn’t exist as far as I can tell. I don’t see anyone proposing to improve GPT-4 by turning from HFRL to self-play RL.
Ultimately I think there’s a possibility that the improvements to LLMs from further scaling may not be very large, and instead we’ll need to find some sort of new architecture to create dangerous AGIs.
Gpt-4 did RL feedback that was self evaluation across all the inputs users fed by chatGPT.
Self play would be having it practice leetcode problems with the RL feedback the score.
The software support is there and the RL feedback worked, why do you think it is even evidence to say “obvious thing that works well hasn’t been done yet or maybe it has, openAI won’t say”
There is also a tremendous amount of self play possible now with the new plugin interface.