I’m happy that this was done before release. However … I’m still left wondering “how many prompts did they try?” In practice, the first AI self-replicating escape is not likely to be a model working alone on a server, but a model carefully and iteratively prompted, with overall strategy provided by a malicious human programmer. Also, one wonders what will happen once the base architecture is in the training set. One need only recognize that there is a lot of profit to be made (and more cheaply) by having the AI identify and exploit zero-days to generate and spread malware (say, while shorting the stock of a target company). Perhaps GPT-4 is not yet capable enough to find or exploit zero-days. I suppose we will find out soon enough.
Note that this creates a strong argument for never open-sourcing the model once a certain level of capability is reached: a GPT-N with enough hints about its own structure will be able to capably write itself.
This is good critique of the details of AI 2027, but whether the prediction should have been for autonomous AI research by 2026 or 2033, it doesn’t look like anything substantive is changing among the policy concerns that the AI 2027 raises.
I think Nikola’s threshold for superhuman AI is conservative enough. If we reach a point where an AI agent (or super-agent) can perform tasks equivalent to 10 human-years of programmer time with 80% accuracy, then it is likely that AI research can be divided between several agents and completely automated. In my opinion, humanity will have lost control of AI by this point: much like the PI of a research lab never knows all the technical details of how their experiments are actually performed, by this point even the leading edge of human researchers are likely to fail to understand the research they are overseeing beyond that of a surface-level abstraction. From well before the point at which humans can no longer understand AI self-improvement, all of AI 2027′s warnings about social and organizational dynamics are relevant: the incentives push companies to ignore the initial warning signs of autonomous and misaligned (evil) behavior, opening the door to potential catastrophe.
Your graph (“six stories”) shows that METR’s plain-old-exponential prediction would put us at this point before 2032, and the “new normal” METR curve based on the most recent model releases would put us there before 2028. So the current paradigm is such that super-exponential growth is not even needed to enter dystopia before 2032, and the uncertainties are such that entering dystopia before 2028 is still a possibility.
Getting the details right is important, but this critique reinforces my impression that AI 2027 is important. I only hope that AI 2027 skeptics don’t start pointing at the headline (“bad”) to argue against making meaningful policy and regulatory changes.