They did this pretty quickly and were able to greatly improve performance on a moderately diverse range of pretty checkable tasks. This implies OpenAI likely has an RL pipeline which can be scaled up to substantially better performance by putting in easily checkable tasks + compute + algorithmic improvements. And, given that this is RL, there isn’t any clear reason this won’t work (with some additional annoyances) for scaling through very superhuman performance (edit: in these checkable domains).[1]
Credit to @Tao Lin for talking to me about this take.
given that this is RL, there isn’t any clear reason this won’t work (with some additional annoyances) for scaling through very superhuman performance
Not where they don’t have a way of generating verifiable problems. Improvement where they merely have some human-written problems is likely bounded by their amount.
Yeah, sorry this is an important caveat. But, I think very superhuman performance in most/all checkable domains is pretty spooky and this is even putting aside how it generalizes.
I think the best bull case is something like:
They did this pretty quickly and were able to greatly improve performance on a moderately diverse range of pretty checkable tasks. This implies OpenAI likely has an RL pipeline which can be scaled up to substantially better performance by putting in easily checkable tasks + compute + algorithmic improvements. And, given that this is RL, there isn’t any clear reason this won’t work (with some additional annoyances) for scaling through very superhuman performance (edit: in these checkable domains).[1]
Credit to @Tao Lin for talking to me about this take.
I express something similar on twitter here.
Not where they don’t have a way of generating verifiable problems. Improvement where they merely have some human-written problems is likely bounded by their amount.
Yeah, sorry this is an important caveat. But, I think very superhuman performance in most/all checkable domains is pretty spooky and this is even putting aside how it generalizes.