Gpt-4 did RL feedback that was self evaluation across all the inputs users fed by chatGPT.
Self play would be having it practice leetcode problems with the RL feedback the score.
The software support is there and the RL feedback worked, why do you think it is even evidence to say “obvious thing that works well hasn’t been done yet or maybe it has, openAI won’t say”
There is also a tremendous amount of self play possible now with the new plugin interface.
Gpt-4 did RL feedback that was self evaluation across all the inputs users fed by chatGPT.
Self play would be having it practice leetcode problems with the RL feedback the score.
The software support is there and the RL feedback worked, why do you think it is even evidence to say “obvious thing that works well hasn’t been done yet or maybe it has, openAI won’t say”
There is also a tremendous amount of self play possible now with the new plugin interface.