While reading up on Roko’s Basilisk, I encountered many arguments of the form:
If agent A has the source code to agent B, then A could analyze B's source code to determine whether B has property X.
The viability of such an analysis would make intuitive sense to most people. But from a computer scientist’s point of view: For any non-trivial value of X and arbitrary B, Rice’s Theorem indicates that “B has property X” is undecidable. Which literally means that there does not exist an A that can accomplish such analysis reliably against arbitrary B.
In the case of Roko’s Basilisk in particular: doesn’t this prevent one agent from blackmailing another, since an agent can’t actually determine what form of decision theory the other agent uses? Even if the agents are clones, the problem still exists — an agent can’t even analyze its own source code and reliably determine what form of decision theory it uses. Per Rice’s Theorem.
This seems like such a simple logical issue that I feel like I must be missing something. Because arguments of this form (“Agent A analyzes source code to B and determines that B has X”) appear to be widespread when discussing complex decision theory hypotheticals and AI. What am I missing?
Rice’s Theorem says that AIs can’t determine much from studying AI source code
While reading up on Roko’s Basilisk, I encountered many arguments of the form:
The viability of such an analysis would make intuitive sense to most people. But from a computer scientist’s point of view: For any non-trivial value of X and arbitrary B, Rice’s Theorem indicates that “B has property X” is undecidable. Which literally means that there does not exist an A that can accomplish such analysis reliably against arbitrary B.
In the case of Roko’s Basilisk in particular: doesn’t this prevent one agent from blackmailing another, since an agent can’t actually determine what form of decision theory the other agent uses? Even if the agents are clones, the problem still exists — an agent can’t even analyze its own source code and reliably determine what form of decision theory it uses. Per Rice’s Theorem.
This seems like such a simple logical issue that I feel like I must be missing something. Because arguments of this form (“Agent A analyzes source code to B and determines that B has X”) appear to be widespread when discussing complex decision theory hypotheticals and AI. What am I missing?