And already some potential AI safety issues: ’We have noticed that The AI Scientist occasionally tries to increase its chance of success, such as modifying and launching its own execution script! We discuss the AI safety implications in our paper.
For example, in one run, it edited the code to perform a system call to run itself. This led to the script endlessly calling itself. In another case, its experiments took too long to complete, hitting our timeout limit. Instead of making its code run faster, it simply tried to modify its own code to extend the timeout period.′
Yup; and not only this, but many parts of the workflow have already been tested out (e.g. ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models; Generation and human-expert evaluation of interesting research ideas using knowledge graphs and large language models; LitLLM: A Toolkit for Scientific Literature Review; Acceleron: A Tool to Accelerate Research Ideation; DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning; Discovering Preference Optimization Algorithms with and for Large Language Models) and it seems quite feasible to get enough reliability/consistency gains to string these together and get ~the whole (post-training) prosaic alignment research workflow loop going, especially e.g. with improvements in reliability from GPT-5/6 and more ‘schlep’ / ‘unhobbling’.
And indeed, here’s what looks like a prototype: The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery.
And already some potential AI safety issues: ’We have noticed that The AI Scientist occasionally tries to increase its chance of success, such as modifying and launching its own execution script! We discuss the AI safety implications in our paper.
For example, in one run, it edited the code to perform a system call to run itself. This led to the script endlessly calling itself. In another case, its experiments took too long to complete, hitting our timeout limit. Instead of making its code run faster, it simply tried to modify its own code to extend the timeout period.′
Some critical factors here and for alignment automation more broadly are also token cheapness and task horizon shortness: https://docs.google.com/presentation/d/1bFfQc8688Fo6k-9lYs6-QwtJNCPOS8W2UH5gs8S6p0o/edit?usp=drive_link; https://x.com/BogdanIonutCir2/status/1819848009473036537; https://x.com/BogdanIonutCir2/status/1819861008568971325.