I don’t think that at present conditions are good for automating R&D.
I kind of wish this was true, because it would likely mean longer timelines, but my expectation is that the incoming larger LMs + better scaffolds + more inference-time compute could quite easily pass the threshold of significant algorithmic progress speedup from automation (e.g. 2x).
Making algorithmic progress and making safety progress seem to differ along important axes relevant to automation:
Algorithmic progress can use 1. high iteration speed 2. well-defined success metrics (scaling laws) 3.broad knowledge of the whole stack (Cuda to optimization theory to test-time scaffolds) 4. …
Alignment broadly construed is less engineering and a lot more blue skies, long horizon, and under-defined (obviously for engineering heavy alignment sub-tasks like jailbreak resistance, and some interp work this isn’t true).
Probably automated AI scientists will be applied to alignment research, but unfortunately automated research will differentially accelerate algorithmic progress over alignment. This line of reasoning is part of why I think it’s valuable for any alignment researcher (who can) to focus on bringing the under-defined into a well-defined framework. Shovel-ready tasks will be shoveled much faster by AI shortly anyway.
Algorithmic progress can use 1. high iteration speed 2. well-defined success metrics (scaling laws) 3.broad knowledge of the whole stack (Cuda to optimization theory to test-time scaffolds) 4. …
Alignment broadly construed is less engineering and a lot more blue skies, long horizon, and under-defined (obviously for engineering heavy alignment sub-tasks like jailbreak resistance, and some interp work this isn’t true).
Generally agree, but I do think prosaic alignment has quite a few advantages vs. prosaic capabilities (e.g. in the extra slides here) and this could be enough to result in aligned (-enough) automated safety researchers which can be applied to the more blue skies parts of safety research. I would also very much prefer something like a coordinated pause around the time when safety research gets automated.
This line of reasoning is part of why I think it’s valuable for any alignment researcher (who can) to focus on bringing the under-defined into a well-defined framework. Shovel-ready tasks will be shoveled much faster by AI shortly anyway.
Agree, I’ve written about (something related to) this very recently.
Yes, things have certainly changed in the four months since I wrote my original comment, with the advent of o1 and Sakana’s Artificial Scientist. Both of those are still incapable of full automation of self-improvement, but they’re close. We’re clearly much closer to a recursive speed up of R&D, leading to FOOM.
I kind of wish this was true, because it would likely mean longer timelines, but my expectation is that the incoming larger LMs + better scaffolds + more inference-time compute could quite easily pass the threshold of significant algorithmic progress speedup from automation (e.g. 2x).
Making algorithmic progress and making safety progress seem to differ along important axes relevant to automation:
Algorithmic progress can use 1. high iteration speed 2. well-defined success metrics (scaling laws) 3.broad knowledge of the whole stack (Cuda to optimization theory to test-time scaffolds) 4. …
Alignment broadly construed is less engineering and a lot more blue skies, long horizon, and under-defined (obviously for engineering heavy alignment sub-tasks like jailbreak resistance, and some interp work this isn’t true).
Probably automated AI scientists will be applied to alignment research, but unfortunately automated research will differentially accelerate algorithmic progress over alignment. This line of reasoning is part of why I think it’s valuable for any alignment researcher (who can) to focus on bringing the under-defined into a well-defined framework. Shovel-ready tasks will be shoveled much faster by AI shortly anyway.
Generally agree, but I do think prosaic alignment has quite a few advantages vs. prosaic capabilities (e.g. in the extra slides here) and this could be enough to result in aligned (-enough) automated safety researchers which can be applied to the more blue skies parts of safety research. I would also very much prefer something like a coordinated pause around the time when safety research gets automated.
Agree, I’ve written about (something related to) this very recently.
Yes, things have certainly changed in the four months since I wrote my original comment, with the advent of o1 and Sakana’s Artificial Scientist. Both of those are still incapable of full automation of self-improvement, but they’re close. We’re clearly much closer to a recursive speed up of R&D, leading to FOOM.