Would something like an agent trained to maximize minerals mined in Starcraft learning to attack other players to monopolize their resources count?
I assume it would count if that same agent was just rewarded every time it mined minerals, or the mineral count went up, without an explicit objective to maximize the amount of minerals it has?
Would a gridworld example work? How complex does the simulation have to be?
I’m probably going to be a stickler about 2. “not with the goal in advance being to show instrumental convergence” meaning that the example can’t be something written in response to this post (though I reserve the right to suspend this if the example is really good).
The reason being, I’m pretty sure that I personally could create such a gridworld simulation. But “I solved instrumental convergence in this toy example I created myself” wouldn’t convince me as an outsider that anything impressive had been done.
Questions:
Would something like an agent trained to maximize minerals mined in Starcraft learning to attack other players to monopolize their resources count?
I assume it would count if that same agent was just rewarded every time it mined minerals, or the mineral count went up, without an explicit objective to maximize the amount of minerals it has?
Would a gridworld example work? How complex does the simulation have to be?
I’m probably going to be a stickler about 2. “not with the goal in advance being to show instrumental convergence” meaning that the example can’t be something written in response to this post (though I reserve the right to suspend this if the example is really good).
The reason being, I’m pretty sure that I personally could create such a gridworld simulation. But “I solved instrumental convergence in this toy example I created myself” wouldn’t convince me as an outsider that anything impressive had been done.