o1/o3/R1/R1-Zero seem to me like evidence that “scaling reasoning models in a self-play-ish regime” can reach superhuman performance on some class of tasks, with properties like {short horizons, cheap objective verifiability, at most shallow conceptual innovation needed} or maybe some subset thereof. This is important! But, for reasons similar to this part of Tsvi’s post, it’s a lot less apparent to me that it can get to superintelligence at all science and engineering tasks.
To be more object-level than Tsvi:
o1/o3/R1/R1-Zero seem to me like evidence that “scaling reasoning models in a self-play-ish regime” can reach superhuman performance on some class of tasks, with properties like {short horizons, cheap objective verifiability, at most shallow conceptual innovation needed} or maybe some subset thereof. This is important! But, for reasons similar to this part of Tsvi’s post, it’s a lot less apparent to me that it can get to superintelligence at all science and engineering tasks.