I might have missed something, but it looks to me like the first ordering might be phrased like the self improvement and the risk aversion are actually happening simultaneously.
If an AI had the ability to self improve for a couple of years before it developed risk aversion, for instance, I think we end up in the “maximal self improvement” / ’high risk” outcomes.
This seems like a big assumption to me:
But self-improvement additionally requires that the AI be aware that it is an AI and be able to perform cutting-edge machine learning research. Thus, solving self-improvement appears to require more, and more advanced, capabilities than apprehending risk.
If an AI has enough resources and is doing the YOLO version of self-improvement, it doesn’t seem like it necessarily requires much in the way of self-awareness or risk apprehension—particularly if it is willing to burn resources on the task. If you ask a current LLM how to take over the world, it says things that appear like “evil AI cosplay”—I could imagine something like that leading to YOLO self-improvement that has some small risk of stumbling across a gain that starts to compound.
There seem to be a lot of big assumptions in this piece, doing a lot of heavy lifting. Maybe I’ve gotten more used to LW style conversational norms about tagging things as assumptions, and it actually fine? My gut instinct is something like “all of these assumptions stack up to target this to a really thin slice of reality, and I shouldn’t update much on it directly”.
I’ve read that OpenAI and DeepMind are hiring for multi-agent reasoning teams. I can imagine that gives another source of scaling.
I figure things like Amdahl’s law / communication overhead impose some limits there, but MCTS could probably find useful ways to divide the reasoning work and have the agents communicating at least at human level efficiency.