Richard_Ngo comments on AI 2027: Responses

Richard_Ngo 9 Apr 2025 0:56 UTC
22 points
10
I left some comments on an earlier version of AI 2027; the most relevant is the following:
June 2027: Self-improving AI
OpenBrain now has a “country of geniuses in a datacenter.”
Most of the humans at OpenBrain can’t usefully contribute anymore. Some don’t realize this and harmfully micromanage their AI teams. Others sit at their computer screens, watching performance crawl up, and up, and up.
This is the point where I start significantly disagreeing with the scenario. My expectation is that by this point humans are still better at tasks that take a week or longer. Also, it starts getting really tricky to improve on these, because you get limited by a number of factors: it takes a long time to get real-world feedback, it takes a lot of compute to experiment on week-long tasks, etc.

I expect these dynamics to be particularly notable when it comes to coordinating Agent-4 copies. Like, probably a lot of p-hacking, then other agents knowing that p-hacking is happening and covering it up, and so on. I expect a lot of the human time will involve trying to detect clusters of Agent-4 copies that are spiralling off into doing wacky stuff. Also at this point the metrics of performance won’t be robust enough to avoid agents goodharting hard on them.