The core feature here seems to be that the agent has some ability to refer to itself, and that this localization differs between instantiations. Alice optimizes for dollars in her wallet, Bob optimizes for dollars in his wallet, and so they end up fighting over dollars despite being clones, because the cloning procedure doesn’t result in arrows pointing at the same wallet.
It seems sensible to me to refer to this as the ‘exact same source code,’ but it’s not obvious to me how you would create these sort of conflicts without that sort of different resolution of pointers, and so it’s not clear how far this argument can be extended.
You don’t necessarily need “explicit self-reference”. The difference in utility functions can also be obtained due to a difference in the location of the agent in the universe. Two identical worms placed in different locations will have different utility functions due to their atoms being not exactly in the same location, despite not having explicit self-reference. Similarly, in a computer simulation, the agents with the same source code will be called by the universe-program in different contexts (if they weren’t, I don’t see how it makes sense to even speak of them as being “different instances of the same source code”. There would just be one instance of the source code.).
So in fact, I think that this is probably a property of almost all possible agents. It seems to me that you need a very complex and specific ontological model in the agent to prevent these effects and have the two agents have the same utility function.
The core feature here seems to be that the agent has some ability to refer to itself, and that this localization differs between instantiations. Alice optimizes for dollars in her wallet, Bob optimizes for dollars in his wallet, and so they end up fighting over dollars despite being clones, because the cloning procedure doesn’t result in arrows pointing at the same wallet.
It seems sensible to me to refer to this as the ‘exact same source code,’ but it’s not obvious to me how you would create these sort of conflicts without that sort of different resolution of pointers, and so it’s not clear how far this argument can be extended.
You don’t necessarily need “explicit self-reference”. The difference in utility functions can also be obtained due to a difference in the location of the agent in the universe. Two identical worms placed in different locations will have different utility functions due to their atoms being not exactly in the same location, despite not having explicit self-reference. Similarly, in a computer simulation, the agents with the same source code will be called by the universe-program in different contexts (if they weren’t, I don’t see how it makes sense to even speak of them as being “different instances of the same source code”. There would just be one instance of the source code.).
So in fact, I think that this is probably a property of almost all possible agents. It seems to me that you need a very complex and specific ontological model in the agent to prevent these effects and have the two agents have the same utility function.
Using the definitions from the post, those agents would be optimising the same utility functions, just by taking different actions.