I might have missed something, but it looks to me like the first ordering might be phrased like the self improvement and the risk aversion are actually happening simultaneously.
If an AI had the ability to self improve for a couple of years before it developed risk aversion, for instance, I think we end up in the “maximal self improvement” / ’high risk” outcomes.
This seems like a big assumption to me:
But self-improvement additionally requires that the AI be aware that it is an AI and be able to perform cutting-edge machine learning research. Thus, solving self-improvement appears to require more, and more advanced, capabilities than apprehending risk.
If an AI has enough resources and is doing the YOLO version of self-improvement, it doesn’t seem like it necessarily requires much in the way of self-awareness or risk apprehension—particularly if it is willing to burn resources on the task. If you ask a current LLM how to take over the world, it says things that appear like “evil AI cosplay”—I could imagine something like that leading to YOLO self-improvement that has some small risk of stumbling across a gain that starts to compound.
There seem to be a lot of big assumptions in this piece, doing a lot of heavy lifting. Maybe I’ve gotten more used to LW style conversational norms about tagging things as assumptions, and it actually fine? My gut instinct is something like “all of these assumptions stack up to target this to a really thin slice of reality, and I shouldn’t update much on it directly”.
I might have missed something, but it looks to me like the first ordering might be phrased like the self improvement and the risk aversion are actually happening simultaneously.
If an AI had the ability to self improve for a couple of years before it developed risk aversion, for instance, I think we end up in the “maximal self improvement” / ’high risk” outcomes.
This seems like a big assumption to me:
If an AI has enough resources and is doing the YOLO version of self-improvement, it doesn’t seem like it necessarily requires much in the way of self-awareness or risk apprehension—particularly if it is willing to burn resources on the task. If you ask a current LLM how to take over the world, it says things that appear like “evil AI cosplay”—I could imagine something like that leading to YOLO self-improvement that has some small risk of stumbling across a gain that starts to compound.
There seem to be a lot of big assumptions in this piece, doing a lot of heavy lifting. Maybe I’ve gotten more used to LW style conversational norms about tagging things as assumptions, and it actually fine? My gut instinct is something like “all of these assumptions stack up to target this to a really thin slice of reality, and I shouldn’t update much on it directly”.