I’m playing Prisoner’s Dilemma and wish to test if an opponent X is honest. I might try the following:
(1) Create two programs, Y and Z, which are algorithmically equivalent but obfuscated versions of X.
(2) Run Y and Z against each other.
If Y and Z don’t cooperate with each other, that’s a good indication that X recognizes itself with a source-code comparison and that I shouldn’t trust X.
This honesty check doesn’t work if Y and Z are given access to their sources. Sure, when I simulate Y against Z, I could lie to Y and tell Y that its source is X (so Y believes itself to be unmodified). But when my deluded Y simulation is deciding whether to cooperate with Z, it (Y) may run Z in simulation. If Y informs its Z-simulation that Z’s source is Z, then that Z-simulation will not be deluded into thinking that it is unmodified. Y’s simulation of Z will be able to detect that it is an (obfuscated) simulation and act accordingly.
This honesty check isn’t fool proof. X can recognize itself with a more complicated handshake — one that survives code obfuscation. But if X recognizes itself with a more complicated handshake, then X doesn’t need to know its own source code (and we shouldn’t bother passing the source code in).
Note that for all values of X and Y, (WrappedCliqueBot X Y) == (CliqueBot CliqueBot Y), and there’s no possible code you could add to CliqueBot that would break this identity. Now I just realized that the very fact that WrappedCliqueBot doesn’t depend on its “self” argument, provides a way to distinguish it from the unmodified CliqueBot using only blackbox queries, so in that sense it’s not quite functionally identical. Otoh, if you consider it unfair to discriminate against agents just because they use old-fashioned quine-type self-reference rather than exploiting the convenience of a “self” argument, then this transformation is fair.
I’m playing Prisoner’s Dilemma and wish to test if an opponent X is honest. I might try the following:
(1) Create two programs, Y and Z, which are algorithmically equivalent but obfuscated versions of X. (2) Run Y and Z against each other.
If Y and Z don’t cooperate with each other, that’s a good indication that X recognizes itself with a source-code comparison and that I shouldn’t trust X.
This honesty check doesn’t work if Y and Z are given access to their sources. Sure, when I simulate Y against Z, I could lie to Y and tell Y that its source is X (so Y believes itself to be unmodified). But when my deluded Y simulation is deciding whether to cooperate with Z, it (Y) may run Z in simulation. If Y informs its Z-simulation that Z’s source is Z, then that Z-simulation will not be deluded into thinking that it is unmodified. Y’s simulation of Z will be able to detect that it is an (obfuscated) simulation and act accordingly.
This honesty check isn’t fool proof. X can recognize itself with a more complicated handshake — one that survives code obfuscation. But if X recognizes itself with a more complicated handshake, then X doesn’t need to know its own source code (and we shouldn’t bother passing the source code in).
I had in mind an automated wrapper generator for the “passed own sourcecode” version of the contest:
Note that for all values of X and Y, (WrappedCliqueBot X Y) == (CliqueBot CliqueBot Y), and there’s no possible code you could add to CliqueBot that would break this identity. Now I just realized that the very fact that WrappedCliqueBot doesn’t depend on its “self” argument, provides a way to distinguish it from the unmodified CliqueBot using only blackbox queries, so in that sense it’s not quite functionally identical. Otoh, if you consider it unfair to discriminate against agents just because they use old-fashioned quine-type self-reference rather than exploiting the convenience of a “self” argument, then this transformation is fair.