I think he just wants an example of an agent being rewarded for something simple (like being rewarded for resource collection) exhibiting power-seeking behavior to the degree that it takes over the game environment. It’s an intuitive difference to a lot of people to an agent specifically maximizing an objective. I actually can’t name an example after looking for an hour, but I would bet money something like that already exists.
My guess is that if you plop two Starcraft AIs on a board and reward them every time they gather resources, with enough training, they would start fighting each other for control of the map. I would also guess that someone has already done this exact scenario. Is there an AI search engine for Reddit anyone would recommend?
I think he just wants an example of an agent being rewarded for something simple (like being rewarded for resource collection) exhibiting power-seeking behavior to the degree that it takes over the game environment.
That’s definitely not what “instrumental convergence” means, in general. So:
Is there a reason to be interested in that phenomenon, rather than instrumental convergence more generally?
Instrumental convergence or convergent instrumental values is the theorized tendency for most sufficiently intelligent agents to pursue potentially unbounded instrumental goals such as self-preservation and resource acquisition [1].
So, the standard central examples of instrumental convergence are self-preservation and resource acquisition. If the OP is asking for examples of “instrumental convergence”, and resource acquisition is not the kind of thing they’re asking for, then the thing they’re asking for is not instrumental convergence (or is at least a much narrower category than instrumental convergence).
If the OP is looking for a pattern like “AI trained at <some goal> ends up ‘taking over the world’”, then that would be an example of instrumental convergence, but it’s a much narrower category than instrumental convergence in general. Asking for “examples of instrumental convergence”, if you actually want examples of AI trained at some random goal “taking over the world” (whatever that means), is confusing in the same way as asking for examples of cars when in fact you want an example of a 2005 red Toyota Camry.
And if people frequently want to talk about 2005 red Toyota Camry specifically, and the word they’re using is “car” (which is already mostly used to mean something else), then that strongly suggests we need a new word.
I see your point. Maybe something like “resource domination” or just “instrumental resource acquisition” is a better term for what he is looking for, I think.
I think he just wants an example of an agent being rewarded for something simple (like being rewarded for resource collection) exhibiting power-seeking behavior to the degree that it takes over the game environment. It’s an intuitive difference to a lot of people to an agent specifically maximizing an objective. I actually can’t name an example after looking for an hour, but I would bet money something like that already exists.
My guess is that if you plop two Starcraft AIs on a board and reward them every time they gather resources, with enough training, they would start fighting each other for control of the map. I would also guess that someone has already done this exact scenario. Is there an AI search engine for Reddit anyone would recommend?
That’s definitely not what “instrumental convergence” means, in general. So:
Is there a reason to be interested in that phenomenon, rather than instrumental convergence more generally?
If so, perhaps we need a different name for it?
What is the difference between that and instrumental convergence?
From the LW wiki page:
So, the standard central examples of instrumental convergence are self-preservation and resource acquisition. If the OP is asking for examples of “instrumental convergence”, and resource acquisition is not the kind of thing they’re asking for, then the thing they’re asking for is not instrumental convergence (or is at least a much narrower category than instrumental convergence).
If the OP is looking for a pattern like “AI trained at <some goal> ends up ‘taking over the world’”, then that would be an example of instrumental convergence, but it’s a much narrower category than instrumental convergence in general. Asking for “examples of instrumental convergence”, if you actually want examples of AI trained at some random goal “taking over the world” (whatever that means), is confusing in the same way as asking for examples of cars when in fact you want an example of a 2005 red Toyota Camry.
And if people frequently want to talk about 2005 red Toyota Camry specifically, and the word they’re using is “car” (which is already mostly used to mean something else), then that strongly suggests we need a new word.
I see your point. Maybe something like “resource domination” or just “instrumental resource acquisition” is a better term for what he is looking for, I think.