Perhaps of import is that if paperclip maximizer A is considering whether to help paperclip maximizer B, B will only want A to take paperclip-maximizing actions. Cognitively sophisticated paperclip maximizers want everybody to want to maximize paperclips over all else. There is no obvious way in which any action could be considered helpful-to-B unless that action also maximizes paperclips, except on axes that don’t matter to B except inasmuch as they maximize paperclips. A real paperclip maximizer will, with no internal conflict whatever, sacrifice its own existence if that act will maximize paperclips. The two paperclip maximizers have identical goals that are completely external to their own experiences (although they will react to their experiences of paperclips, what they want are real paperclips, not paperclip experiences). Most real agents aren’t quite like that.
Perhaps an intuition pump is appropriate at this point, explicating what I mean by the verb “assist”.
Alfonse, the paperclip maximizer, decides that the best way to maximize paperclips is to conquer the world. In pursuit of the subgoal of conquering the world, Alfonse transforms itself into an army. After a fierce battle with some non-paperclip-ists, an instance of Alfonse considers whether to bind a different, badly injured Alfonse’s wound.
Binding the wound will cost some time and calories, resources which might be used in other ways. However, if the wound is bound, the other Alfonse may have an increased chance of fighting in another battle.
By “assistance” I mean “spending an instance-local resource in order that another instance obtains (or doesn’t lose) some resources local to the other instance”.
Perhaps of import is that if paperclip maximizer A is considering whether to help paperclip maximizer B, B will only want A to take paperclip-maximizing actions. Cognitively sophisticated paperclip maximizers want everybody to want to maximize paperclips over all else. There is no obvious way in which any action could be considered helpful-to-B unless that action also maximizes paperclips, except on axes that don’t matter to B except inasmuch as they maximize paperclips. A real paperclip maximizer will, with no internal conflict whatever, sacrifice its own existence if that act will maximize paperclips. The two paperclip maximizers have identical goals that are completely external to their own experiences (although they will react to their experiences of paperclips, what they want are real paperclips, not paperclip experiences). Most real agents aren’t quite like that.
Perhaps an intuition pump is appropriate at this point, explicating what I mean by the verb “assist”.
Alfonse, the paperclip maximizer, decides that the best way to maximize paperclips is to conquer the world. In pursuit of the subgoal of conquering the world, Alfonse transforms itself into an army. After a fierce battle with some non-paperclip-ists, an instance of Alfonse considers whether to bind a different, badly injured Alfonse’s wound.
Binding the wound will cost some time and calories, resources which might be used in other ways. However, if the wound is bound, the other Alfonse may have an increased chance of fighting in another battle.
By “assistance” I mean “spending an instance-local resource in order that another instance obtains (or doesn’t lose) some resources local to the other instance”.
The two instances should agree on the solution, whatever it is.