To be clear, I don’t expect the current Sakana AI to produce anything revolutionary, and even if it somehow did, it would probably be hard to separate it from all the other less-good stuff it would produce. But I was surprised that it’s even as good as this, even having seen many of the workflow components in other papers previously (I would have guessed that it would take better base models to reliably string together all the components). And I think it might already e.g. plausibly come up with some decent preference learning variants, like some previous Sakana research (though it wasn’t automating the entire research workflow). So, given that I expect fast progress in the size of the base models (on top of the obvious possible improvements to the AI scientist, including by bringing in more stuff from other papers—e.g. following citation trails for ideas / novelty checks), improvements seem very likely. Also, coding and math seem like the most relevant proxy abilities for automated ML research (and probably also for automated prosaic alignment), and, crucially, in these domains it’s much easier to generate (including superhuman-level) verifiable, synthetic training data—so that it’s hard to be confident models won’t get superhuman in these domains soon. So I expect the most important components of ML and prosaic alignment research workflows to probably be (broadly speaking, and especially on tasks with relatively good, cheap proxy feedback) at least human-level in the next 3 years, in line with e.g. some Metaculus/Manifold predictions on IMO or IOI performance.
Taking all the above into account, I expect many parts of prosaic alignment research—and of ML research - (especially those with relatively short task horizons, requiring relatively little compute, and having decent proxies to measure performance) to be automatable soon (<= 3 years). I expect most of the work on improving Sakana-like systems to happen by default and be performed by capabilities researchers, but it would be nice to have safety-motivated researchers start experimenting, or at least thinking about how (e.g. on which tasks) to use such systems. I’ve done some thinking already (around which safety tasks/subdomains might be most suitable) and hope to publish some of it soon—and I might also start playing around with Sakana’s system.
I do expect things to be messier for generating more agent-foundations-type research (which I suspect might be closer to what you mean by ‘LW posts and papers’) - because it seems harder to get reliable feedback on the quality of the research, but even there, I expect at the very least quite strong human augmentation to be possible (e.g. >= 5x acceleration) - especially given that the automated reviewing part seems already pretty close to human-level, at least for ML papers.
Also, coding and math seem like the most relevant proxy abilities for automated ML research (and probably also for automated prosaic alignment), and, crucially, in these domains it’s much easier to generate (including superhuman-level) verifiable, synthetic training data—so that it’s hard to be confident models won’t get superhuman in these domains soon.
I think o1 is significant evidence in favor of this view.
To be clear, I don’t expect the current Sakana AI to produce anything revolutionary, and even if it somehow did, it would probably be hard to separate it from all the other less-good stuff it would produce. But I was surprised that it’s even as good as this, even having seen many of the workflow components in other papers previously (I would have guessed that it would take better base models to reliably string together all the components). And I think it might already e.g. plausibly come up with some decent preference learning variants, like some previous Sakana research (though it wasn’t automating the entire research workflow). So, given that I expect fast progress in the size of the base models (on top of the obvious possible improvements to the AI scientist, including by bringing in more stuff from other papers—e.g. following citation trails for ideas / novelty checks), improvements seem very likely. Also, coding and math seem like the most relevant proxy abilities for automated ML research (and probably also for automated prosaic alignment), and, crucially, in these domains it’s much easier to generate (including superhuman-level) verifiable, synthetic training data—so that it’s hard to be confident models won’t get superhuman in these domains soon. So I expect the most important components of ML and prosaic alignment research workflows to probably be (broadly speaking, and especially on tasks with relatively good, cheap proxy feedback) at least human-level in the next 3 years, in line with e.g. some Metaculus/Manifold predictions on IMO or IOI performance.
Taking all the above into account, I expect many parts of prosaic alignment research—and of ML research - (especially those with relatively short task horizons, requiring relatively little compute, and having decent proxies to measure performance) to be automatable soon (<= 3 years). I expect most of the work on improving Sakana-like systems to happen by default and be performed by capabilities researchers, but it would be nice to have safety-motivated researchers start experimenting, or at least thinking about how (e.g. on which tasks) to use such systems. I’ve done some thinking already (around which safety tasks/subdomains might be most suitable) and hope to publish some of it soon—and I might also start playing around with Sakana’s system.
I do expect things to be messier for generating more agent-foundations-type research (which I suspect might be closer to what you mean by ‘LW posts and papers’) - because it seems harder to get reliable feedback on the quality of the research, but even there, I expect at the very least quite strong human augmentation to be possible (e.g. >= 5x acceleration) - especially given that the automated reviewing part seems already pretty close to human-level, at least for ML papers.
I think o1 is significant evidence in favor of this view.