Eliezer Yudkowsky comments on Towards a New Decision Theory

Eliezer Yudkowsky 16 Aug 2009 22:04 UTC
5 points

The problem is that the two human player’s minds aren’t logically related. Each human player in this game wants his AI to play defect, because their decisions are logically independent of each other’s.

Your statement above is implicitly self-contradictory. How can you generalize over all the players in one fell swoop, applying the same logic to each of them, and yet say that the decisions are “logically independent”? The decisions are physically independent. Logically, they are extremely dependent. We are arguing over what is, in general, the “smart thing to do”. You assume that if “the smart thing to do” is defect, and so all the players will defect. Doesn’t smell like logical independence to me.

More importantly, the whole calculation about independence versus dependence is better carried out by an AI than by a human programmer, which is what TDT is for. It’s not for cooperating. It’s for determining the conditional probability of the other agent cooperating given that a TDT agent in your epistemic state plays “cooperate”. If you know that the other agent knows (up to common knowledge) that you are a TDT agent, and the other agent knows that you know (up to common knowledge) that it is a TDT agent, then it is an obvious strategy to cooperate with a TDT agent if and only if it cooperates with you under that epistemic condition.

The TDT strategy is not “Cooperate with other agents known to be TDTs”. The TDT strategy for the one-shot PD, in full generality, is “Cooperate if and only if (‘choosing’ that the output of this algorithm under these epistemic conditions be ‘cooperate’) makes it sufficiently more likely that (the output of the probability distribution of opposing algorithms under its probable epistemic conditions) is ‘cooperate’, relative to the relative payoffs.”

Under conditions where a TDT plays one-shot true-PD against something that is not a TDT and not logically dependent on the TDT’s output, the TDT will of course defect. A TDT playing against a TDT which falsely believes the former case to hold, will also of course defect. Where you appear to depart from my visualization, Wei Dai, is in thinking that logical dependence can only arise from detailed examination of the other agent’s source code, because otherwise the agent has a motive to defect. You need to recognize your belief that what players do is in general likely to correlate, as a case of “logical dependence”. Similarly the original decision to change your own source code to include a special exception for defection under particular circumstances, is what a TDT agent would model—if it’s probable that the causal source of an agent thought it could get away with that special exception and programmed it in, the TDT will defect.

You’ve got logical dependencies in your mind that you are not explicitly recognizing as “logical dependencies” that can be explicitly processed by a TDT agent, I think.