And already some potential AI safety issues: ’We have noticed that The AI Scientist occasionally tries to increase its chance of success, such as modifying and launching its own execution script! We discuss the AI safety implications in our paper.
For example, in one run, it edited the code to perform a system call to run itself. This led to the script endlessly calling itself. In another case, its experiments took too long to complete, hitting our timeout limit. Instead of making its code run faster, it simply tried to modify its own code to extend the timeout period.′
Obviously relevant for automating safety research too; see this presentation and this comment for some related thoughts.
Quick take: I think LM agents to automate large chunks of prosaic alignment research should probably become the main focus of AI safety funding / person-time. I can’t think of any better spent marginal funding / effort at this time.
Prototype of LLM agents automating the full AI research workflow: The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery.
And already some potential AI safety issues: ’We have noticed that The AI Scientist occasionally tries to increase its chance of success, such as modifying and launching its own execution script! We discuss the AI safety implications in our paper.
For example, in one run, it edited the code to perform a system call to run itself. This led to the script endlessly calling itself. In another case, its experiments took too long to complete, hitting our timeout limit. Instead of making its code run faster, it simply tried to modify its own code to extend the timeout period.′
Obviously relevant for automating safety research too; see this presentation and this comment for some related thoughts.
Quick take: I think LM agents to automate large chunks of prosaic alignment research should probably become the main focus of AI safety funding / person-time. I can’t think of any better spent marginal funding / effort at this time.