An AGI whose values include curiosity, gaining access to more tools or stockpiling resources might systematize them to “gaining power over the world”.
At first blush this sounds to me to be more implausible than the standard story in which gaining influence and power are adopted as instrumental goals. E.g. human EAs, BLM, etc. all developed a goal of gaining power over the world, but it was purely instrumental (for the sake of helping people, achieving systemic change, etc.). It sounds like you are saying that this isn’t what will happen with AI, and instead they’ll learn to intrinsically want this stuff?
In the standard story, what are the terminal goals? You could say “random” or “a mess”, but I think it’s pretty compelling to say “well, they’re probably related to the things that the agent was rewarded for during training”. And those things will likely include “curiosity, gaining access to more tools or stockpiling resources”.
I call these “convergent final goals” and talk more about them in this post.
I also think that an AGI might systematize other goals that aren’t convergent final goals, but these seem harder to reason about, and my central story for which goals it systematizes are convergent final goals. (Note that this is somewhat true for humans, as well: e.g. curiosity and empowerment/success are final-ish goals for many people.)
At first blush this sounds to me to be more implausible than the standard story in which gaining influence and power are adopted as instrumental goals. E.g. human EAs, BLM, etc. all developed a goal of gaining power over the world, but it was purely instrumental (for the sake of helping people, achieving systemic change, etc.). It sounds like you are saying that this isn’t what will happen with AI, and instead they’ll learn to intrinsically want this stuff?
In the standard story, what are the terminal goals? You could say “random” or “a mess”, but I think it’s pretty compelling to say “well, they’re probably related to the things that the agent was rewarded for during training”. And those things will likely include “curiosity, gaining access to more tools or stockpiling resources”.
I call these “convergent final goals” and talk more about them in this post.
I also think that an AGI might systematize other goals that aren’t convergent final goals, but these seem harder to reason about, and my central story for which goals it systematizes are convergent final goals. (Note that this is somewhat true for humans, as well: e.g. curiosity and empowerment/success are final-ish goals for many people.)