What’s worse about it? (I have a defense of the name I’ll hold off on sharing until I hear your opnion.)
Telepathic implies some kind of supernatural communication. TDT sortof behaves like that except that there is no actual communication going on. Timeless isn’t a very good name, but telepathic has too much wierd baggage, I think. Curious to see your arguments.
If you bend what you mean by ‘causality,’ sure, but I don’t think there’s value in bending the word that way.
Well it lets you vastly simplify the causality structures in your predictions, for one.
TDT sortof behaves like that except that there is no actual communication going on.
Huh? How could X possibly run TDT without having Y’s source code communicated to it?
Curious to see your arguments.
My model of TDT is that, rather than looking at action-action-outcome triplets, it looks at strategy-strategy-outcome triplets. The rest of game theory remains the same, and so once you have a strategy-strategy-outcome table, you find the Nash equilibrium and you’re done. (If constructing that table fails, you revert to the Nash equilibrium from the action-action-outcome table.)
The material difference between this and regular game theory is that we now have access to strategies- i.e., we can read our opponent’s mind, i.e. telepathy. (Maybe mind-reading decision theory is a better term, but it doesn’t abbreviate to TDT like telepathic decision theory does.)
You can still run TDT in situations with time (like an iterated prisoner’s dilemma), but you can’t run TDT in situations where you don’t have your opponent’s source code. So calling it “timeless” when it can involve time seems odd, as is not referring to the necessity of source code.
My model of TDT is that, rather than looking at action-action-outcome triplets, it looks at strategy-strategy-outcome triplets.
This characterization/analogy doesn’t fit, and doesn’t seem to help with making the necessary-knowledge-about-opponent distinction. Knowing that performing action A implies that your opponent performs action B is a weaker statement than unconditionally knowing that your opponent performs action B. In standard game theory, you don’t know what action your opponent performs, and with TDT you don’t know that as well. But not knowing something doesn’t (automatically) make it not happen. So if there is indeed a dependence of your opponent’s action on your own action, it’s useful to know it.
The difference between considering opponent’s actions in standard game theory and considering opponent’s “strategy” (dependence of opponent’s action on your action) is that while the former is usually unknown (to both TDT and standard game theory), the latter can in principle be known, and making use of this additional potential knowledge is what distinguishes TDT. So the actions in game theory and “strategies” in TDT are not analogous.
Knowing that performing action A implies that your opponent performs action B is a weaker statement than unconditionally knowing that your opponent performs action B.
Okay. The first looks like a strategy to me, and the second looks like an action. Right?
In standard game theory, you don’t know what action your opponent performs, and with TDT you don’t know that as well.
I agree, and that matches my characterization of TDT.
But not knowing something doesn’t (automatically) make it not happen. So if there is indeed a dependence of your opponent’s action on your own action, it’s useful to know it.
I’m not understanding this, though. Are you just saying that knowing about your opponent’s strategy gives you useful information?
the latter can in principle be known,
How do you learn it?
So the actions in game theory and “strategies” in TDT are not analogous.
The analogy is that both of them get put into a table, and then you find the Nash equilibria by altering the favored rows and columns and columns of the table, and then you pick the best of the equilibria. (TDT has known bargaining problems, right? That looks like it maps onto disagreeing over which Nash equilibrium to pick.) Would it help if I made a walkthrough of my model with actual tables?
Knowing that performing action A implies that your opponent performs action B is a weaker statement than unconditionally knowing that your opponent performs action B.
Okay. The first looks like a strategy to me, and the second looks like an action. Right?
Y doesn’t act according to the rule “Let’s see what X does. If it does A, I’m going to do B, etc.”, and so it’s misleading to call that “a strategy”. This is only something like what X infers about Y, but this is not how Y reasons, because Y can’t infer what X does, and so it can’t respond depending on what X does.
The actual strategy is to figure out an action based on the other player’s code, not to figure out an action based on the other player’s action. This strategy, which doesn’t involve responding to actions, can be characterized as establishing a dependence between players’ actions, and this characterization is instrumental to the strategy itself, a part of what makes the characterization correct.
Y doesn’t act according to the rule “Let’s see what X does. If it does A, I’m going to do B, etc.”, and so it’s misleading to call that “a strategy”. This is only something like what X infers about Y, but this is not how Y reasons, because Y can’t infer what X does, and so it can’t respond depending on what X does.
So “play whatever I think X will play” does count as a strategy, but “play whatever X plays” does not count as a strategy because Y can’t actually implement it. Limiting X and Y to the first sort of strategies was meant to be part of my characterization, but I could have made that clearer.
So “play whatever I think X will play” does count as a strategy, but “play whatever X plays” does not count as a strategy because Y can’t actually implement it.
It can’t implement “play whatever I think X will play” either, because it doesn’t know what X will play.
In one statement, if we are talking about ADT-like PD (the model of TDT in this post appears to be more complicated), Y could be said to choose the action such that provability of Y choosing that action implies X’s choosing a good matching action. So Y doesn’t act depending on what X does or what Y thinks X does etc., Y acts depending on what X can be inferred to do if we additionally assume that Y is doing a certain thing, and the thing we additionally assume Y to be doing is a specific action, not a strategy of responding to X’s source code, or a strategy of responding to X’s action. If you describe X’s algorithm the same way, you can see that the additional assumption of Y’s action is not what X uses in making its decision, for it similarly makes an additional assumption of its own (X’s) action and then looks what can be inferred about Y’s action (and not Y’s “strategy”).
Y acts depending on what X can be inferred to do if we additionally assume that Y is doing a certain thing, and the thing we additionally assume Y to be doing is a specific action, not a strategy of responding to X’s source code, or a strategy of responding to X’s action.
Can you write the “cooperate iff I cooperate iff they cooperate … ” bot this way? I thought the strength of TDT was that it allowed that bot.
Can you write the “cooperate iff I cooperate iff they cooperate … ” bot this way?
This can be unpacked as an algorithm that searches for a proof of the statement “If I cooperate, then my opponent also cooperates; if I defect, then my opponent also defects”, and if it finds its proof, it cooperates. Under certain conditions, two players running something like this algorithm will cooperate. As you can see, agent’s decision here depends not on the opponent’s decision, but on the opponent’s decision’s dependence on your decision (and not dependence on the dependence of your decision on the opponent’s decision, etc.).
Okay. I think that fits with my view: so long as it’s possible to go from X’s strategy and Y’s strategy to an outcome, then we can build a table of strategy-strategy-outcome triplets, and do analysis on that. (I built an example over here.) What I’m taking from this subthread is that the word “strategy” needs to have a particular meaning to be accurate, and so I need to be more careful when I use it so that it’s clear that I’m conforming to that meaning.
I usually understand “mind-reading” to encompass being aware of the current state of a system. Two systems that simply know one another’s strategies can’t predict one another’s behaviors if their strategies include random coin flips, for example, or depend on information that one system can observe but the other cannot; whereas I would expect telepathic agents to be aware of the results of such observations as well.
If you did have two telepaths playing any game, and one of them decided a mixed strategy was optimal, they wouldn’t want to know what action they were playing until it was played- because otherwise they might leak that knowledge to the other player. That is, in a competitive situation I don’t think mind-reading would extend to coin-reading, but if your understanding is common then ‘mind-reading’ is a bad phrase to use. Is there a good word for “has access to its opponent’s source code”? Bonus points if it starts with a T.
(Also, my understanding is that TDT will defect against any mixed strategy in the prisoner’s dilemma.)
(Also, my understanding is that TDT will defect against any mixed strategy in the prisoner’s dilemma.)
Not necessarily. It will play a mixed strategy against an identical mixed strategy if that if what it needs to do to get them to play mixed rather than D. It’s the other guy being weird and arbitrary in that case, not the TDT.
Well, it certainly will defect against any mixed strategy that is hard coded into the opponent’s source code. On the other hand, if the mixed strategy the opponent plays is dependent on what it predicts the TDT agent will play, then the TDT agent will figure out which outcome has a higher expected utility:
(I defect, Opponent runs “defection predicted” mixed strategy) (I cooperate, Opponent runs “cooperation detected” mixed strategy)
Of course, this is still simplifying things a bit, since it assumes that the opponent can perfectly predict one’s strategy, and it also rules out the possibility of the TDT agent using a mixed strategy himself.
Thus the actual computation is more like ArgMax(Sum(ExpectedUtility(S,T)*P(T|S)))
where the argmax is over S: all possible mixed strategies for the TDT agent the sum is over T: all possible mixed strategies for the opponent and P(T|S) is the probability that opponent will play T, given that we choose to play S. (so this is essentially an estimate of the opponent’s predictive power.)
Won’t that let the opponent steal utility from you? Consider the case where you’re going up against another TDTer which is willing to consider both the strategy “if they cooperate only if I cooperate, then cooperate with 99% probability” and “if they cooperate only if I cooperate, then cooperate.” You want your response to the first strategy to be defection and your response to the second strategy to be cooperation, so it’s in their interests to play the second strategy.
You’re right, if the opponent is a TDT agent. I was assuming that the opponent was simply a prediction=>mixed strategy mapper. (In fact, I always thought that the strategy 51% one-box 49% two box would game the system, assuming that Omega just predicts the outcome which is most likely).
If the opponent is a TDT agent, then it becomes more complex, as in the OP. Just as above, you have to take the argmax over all possible y->x mappings, instead of simply taking the argmax over all outputs.
Putting it in that perspective, essentially in this case we are adding all possible mixed strategies to the space of possible outputs. Hmmm… That’s somewhat a better way of putting it than everything else I said.
In any case, two TDT agents will both note that the program which only cooperates 100% iff the opponent cooperates 100% dominates all other mixed strategies against such an opponent.
So to answer the original question: Yes, it will defect against blind mixed strategies. No, it will not necessarily defect against simple (prediction =>mixed strategy) mappers. N/A against another TDT agent, as neither will ever play a mixed strategy, so to ask what whether it would cooperate with a mixed strategy TDT agent is counterfactual.
EDIT: Thinking some more, I realize that TDT agents will consider the sort of 99% rigging against each other — and will find that it is better than the cooperate IFF strategy. However, this is where the “sanity check” become important. The TDT agent will realize that although such a pure agent would do better against a TDT opponent, the opponent knows that you are a TDT agent as well, and thus will not fall for the trap.
Out of this I’ve reached two conclusions:
The sanity check outlined above is not broad enough, as it only sanity checks the best agents, whereas even if the best possible agent fails the sanity check, there still could be an improvement over the nash equilibrium which passes.
Eliezer’s previous claim that a TDT agent will never regret being a TDT agent given full information is wrong (hey, I thought it was right too). Either it gives in to a pure 99% rigger or it does not. If it does, then it regrets not being able to 99% rig another TDT agent. If it does not, then it regrets not being a simple hard-coded cooperator against a 99% rigger. This probably could be formalized a bit more, but I’m wondering if Eliezer et. al. have considered this?
EDIT2: I realize I was a bit confused before. Feeling a bit stupid. Eliezer never claimed that a TDT agent won’t regret being a TDT agent (which is obviously possible, just consider a clique-bot opponent), but that a TDT agent will never regret being given information.
(In fact, I always thought that the strategy 51% one-box 49% two box would game the system, assuming that Omega just predicts the outcome which is most likely).
Incidentally, my preferred version of Newcomb is that if the Predictor decides that your chance of one-boxing is p, it puts (one million times p) dollars in the big box. Presumably, you know that the Predictor is both extremely well-calibrated and shockingly accurate (it usually winds up with p near 0 or near 1).
The sanity check outlined above is not broad enough, as it only sanity checks the best agents, whereas even if the best possible agent fails the sanity check, there still could be an improvement over the nash equilibrium which passes.
Yup, this is where I’m going in a future post. See the footnote on this post about other variants of TDT; there’s a balance between missing workable deals against genuinely stubborn opponents, and failing to get the best possible deal from clever but flexible opponents. (And, if I haven’t made a mistake in the reasoning I haven’t checked, there is a way to use further cleverness to do still better.)
For now, note that TDT wouldn’t necessarily prefer to be a hard-coded 99% cooperator in general, since those get “screw you” mutual defections from some (stubborn) agents that mutually cooperate with TDT.
You’ve made it into a bargaining game with that mixed strategy, and indeed the version of TDT we introduced will defect against an opponent that outputs the mixed strategy (if that opponent would output a pure cooperate if that were the only way to get its adversary to cooperate). But bargaining games are complicated, and I’m saving that for another post.
It’s a better analogy to say that TDT is capable of offering deals (and self-enforcing them) even without traditional means of communication and precommitment (as long as it has some means for inferring its opponent’s thinking), and so it has a wider range of options than CDT to optimize over.
Well, offering deals isn’t enough, right? The self-enforcing part is really important, and that’s where the Nash equilibrium idea comes in: it’s self-enforced because no party can gain by unilaterally changing their strategy (which is a somewhat different restriction than no party gaining from unilaterally changing their action).
[EDIT: I see now what Vaniver was saying, thanks to the diagram in xer reply.] Try writing it out for the Prisoner’s Dilemma in such a way that a CDT agent would cooperate with itself in the strategy-strategy game (I mean, such that they would each output a strategy that makes them both cooperate in the original PD), and you’ll see the problem. You need to eliminate logically impossible squares of the matrix, not just add new rows.
So, let’s consider this with three strategies: 1) CooperateBot, 2) DefectBot, 3) CliqueBot.
We now get a table that looks like this (notice the letters are outcomes, not actions):
X has control over which row they get, and Y has control over which column they get. The two Nash equilibria are (2,2) and (3,3). Conveniently, (2,2) and (2,3) have the same result, and so it looks like it’ll be easy to figure out that (3,3) is the better Nash equilibrium.
[EDIT: The following is false.] A clever CDT would be able to act like TDT if it considered, not the choice of whether to output C or D, but the choice of which mathematical object to output (because it could output a mathematical object that evaluates to C or D in a particular way depending on the code of Y—this gives it the option of genuinely acting like TDT would).
This has the interesting conclusion that even without the benefit of self-modification, a CDT agent with a good model of the world ends up acting more like TDT than traditional game theorists would expect. (Another example of this is here.) The version of CDT in the last post, contrariwise, is equipped with a very narrow model of the world and of its options. [End falsehood.]
I think these things are fascinating, but I think it’s important to show that you can get TDT behavior without incorporating anthropic reasoning, redefinition of its actions, or anything beyond a basic kind of framework that human beings know how to program.
(By the way, I wouldn’t call option 3 CliqueBot, because CliqueBots as I defined them have problems mutually cooperating with anything whose outputs aren’t identical to theirs. I think it’s better for Option 3 to be the TDT algorithm defined in the post.)
It seems to come up all the time that people aren’t aware that CDT with a sufficiently good world model (a sufficiently accurate causal graph) is the same as TDT, even though this has been known for years. If you could address that somewhere in your sequence I think you’d save a lot of people a lot of time—it’s the most common objection to standard discourse about decision theory that I’ve seen.
It seems to come up all the time that people aren’t aware that CDT with a sufficiently good world model (a sufficiently accurate causal graph) is the same as TDT
CDT leaves the money on the ground? Not unless the “sufficiently good world model” isn’t so much “sufficiently good” as it is an artificial hack that compensates for bad decision making by twisting what causal graphs are supposed to mean.
This has the interesting conclusion that even without the benefit of self-modification, a CDT agent with a good model of the world ends up acting more like TDT than traditional game theorists would expect.
This is a pretty common feature of comparisons between decision theories: different outcomes generally require different assumptions.
I think these things are fascinating, but I think it’s important to show that you can get TDT behavior without incorporating anthropic reasoning, redefinition of its actions, or anything beyond a basic kind of framework that human beings know how to program.
It’s not clear to me what the difference is between the TDT algorithm in your post and the method I’ve described. You need some method of determining what the outcome pair is from strategy pair, and the inference module can (hopefully) do that. The u_f that you use is the utility of the X outcome corresponding to the best Y outcome in row f, and picking the best of those corresponds to finding the best of the Nash equilibria (in the absence of bargaining problems). The only thing I don’t mention is the sanity check, but that should just be another run of the inference module.
By the way, I wouldn’t call option 3 CliqueBot, because CliqueBots as I defined them have problems mutually cooperating with anything whose outputs aren’t identical to theirs. I think it’s better for Option 3 to be the TDT algorithm defined in the post.
Sure, but does it have a short name? ProofBot?
(Notice that Y running the full TDT algorithm corresponds to there being multiple columns in the table: if you were running X against a CooperateBot, you’d just have the first column, and the Nash equilibrium would be (2,1) or (3,1). If you were running it against CliqueBot without a sanity check, there would just be the third column, and it would think (3,3) was the Nash equilibrium, but would be in for a nasty surprise when CliqueBot rejects it because of its baggage.)
It’s not clear to me what the difference is between the TDT algorithm in your post and the method I’ve described.
If you make sure to include a sanity check, then your description should do the same thing as the TDT algorithm in the post (at least on simple games; there may be a difference in bargaining situations.)
Sure, but does it have a short name? ProofBot?
I understand why you might feel it’s circular to name that row TDT, but nothing simpler (unless you count ADT/UDT as simpler) does as it does. It’s a layer more complicated than Newcomblike agents (which should also be included in your table); in order to get mutual cooperation with self and also defection against CooperateBot, it deduces whether a DefectBot or a MimicBot (C if it deduces Y=C, D otherwise) has a better outcome against Y, runs a sanity check, and if that goes through it does what the preferred strategy does.
After further review, I was wrong that CDT would be capable of making use of this to act like TDT. If CDT treats its output as separate from the rest of the causal graph (in the sense of the previous post), then it would still prefer to output an always-defect rather than a Löbian mathematical object. So it does take a different kind of agent to think of Nash equilibria among strategies.
Also, the combinatorics of enumerating strategies and looking for Nash equilibria are kind of awful: there are 16 different inputs that such a strategy has to deal with (i.e. what the opponent does against CooperateBot, DefectBot, NewcombBot and AntiNewcombBot), so there are 2^16 variant strategies in the same class. The one we call TDT is the Nash equilibrium, but it would take a long while to establish that in a naive implementation.
If CDT treats its output as separate from the rest of the causal graph (in the sense of the previous post), then it would still prefer to output an always-defect rather than a Löbian mathematical object.
When its source code is provided to its opponent, how could it be sensible to treat its output as separate from the rest of the causal graph?
Also, the combinatorics of enumerating strategies and looking for Nash equilibria are kind of awful
Sure, but it’s just as bad for the algorithm you wrote: you attempt to deduce output Y(code G, code A_i) for all A_i, which is exactly what you need to determine the Nash Equilibrium of this table. (Building the table isn’t much extra work, if it even requires more, and is done here more for illustrative than computational purposes.)
After further review, I was wrong that CDT would be capable of making use of this to act like TDT.
I am really worried that those two objections were enough to flip your position on this.
Sure, but it’s just as bad for the algorithm you wrote: you attempt to deduce output Y(code G, code A_i) for all A_i, which is exactly what you need to determine the Nash Equilibrium of this table.
No, there are four different A_i (CooperateBot, DefectBot, NewcombBot and AntiNewcombBot). 2^16 is the number of distinct agents one could write that see what Y does against the A_i and picks an action based on those responses. Just taking the maximum each time saves you from enumerating 2^16 strategies.
When its source code is provided to its opponent, how could it be sensible to treat its output as separate from the rest of the causal graph?
That is what CDT is. “Sensible” doesn’t enter into it.
(To expand on this, CDT’s way of avoiding harmful self-reference is to treat its decision as a causally separate node and try out different values for it while changing nothing else on the graph, including things that are copies of its source code. So it considers it legitimate to figure out the impact of its present decision on any agent who can see the effects of the action, but not on any agent who can predict the decision. Don’t complain to me, I didn’t make this up.)
Just taking the maximum each time saves you from enumerating 2^16 strategies.
It’s not clear to me that’s the case. If your bot and my bot both receive the same source code for Y, we both determine the correct number of potential sub-strategies Y can use, and have to evaluate each of them against each of our A_is. I make the maximization over all of Y’s substrategies explicit by storing all of the values I obtain, but in order to get the maximum you also have to calculate all possible values. (I suppose you could get a bit of computational savings by exploiting the structure of the problem, but that may not generalize to arbitrary games.)
To expand on this, CDT’s way of avoiding harmful self-reference is to treat its decision as a causally separate node and try out different values for it while changing nothing else on the graph, including things that are copies of its source code.
The basic decision here is what to write as the source code, not the action that our bot outputs, and so CDT is fine- if it modifies the source code for X, that can impact the outputs of both X and Y. There’s no way to modify the output of X without potentially modifying the output of Y in this game and I don’t see a reason for CDT to mistakenly hallucinate one.
Put another way, I don’t think I would use “causally separate”- I think I would use “unprecedented.” The influence diagram I’m drawing for this has three decision boxes (made by three different agents), all unprecedented, whose outputs are the code for X, Y, and G; all of them point to the calculation node of the inference module, and then all three codes and the inference module point to separate calculation nodes of X’s output and Y’s output, which then both point to the value node of Game Outcome. (You could have uncertainty nodes pointing to X’s output and Y’s output separately, but I’m ignoring mixed strategies for now.)
So it considers it legitimate to figure out the impact of its present decision on any agent who can see the effects of the action, but not on any agent who can predict the decision.
To the best of my knowledge, this isn’t a feature of CDT: it’s a feature of the embedded physics module used by most CDTers. If we altered Newcomb’s problem such that Omega filled the boxes after the agent made their choice, then CDTers would one-box- and so if you have a CDTer who believes that perfect prediction is equivalent to information moving backwards in time (and that that’s possible), then you have a one-boxing CDTer.
To be a trustworthy tool rather than a potential trap, the source code to Y has to be completely accurate and has to have final decision making authority. Y’s programmer has to be able to accurately say “Here is enough information to predict any decision that my irrevokably delegated representative would ever possibly make in this interaction.” Saying that this is “without traditional means of communication” is technically accurate but very deceptive. Saying that this is “no actual communication” is outright backwards; if anything it’s much more communication than is traditionally imagined.
Unlimited communication, even. In the hypothetical “AGIs can inspect each others’ source code” case, the AGIs could just as well run that source code and have two separate conversations, one between each original and its counterpart’s copy. If the AGIs’ senses of ethics were sufficiently situation-dependent, then to generate useful proofs they’d need to be able to inspect copies of each others’ current state as well, in which case the two separate conversations might well be identical.
This is all true, but you can come up with situations where exchanging source code is more relevant. For instance, Robin Hanson has frequently argued that agents will diverge from each other as they explore the universe, and that someone will start burning the cosmic commons. This is a Prisoner’s Dilemma without traditional communication (since signals are limited by lightspeed, it would be too late to stop someone distant from defecting once you see they’ve started). But the “exchange of source code” kind of coordination is more feasible.
Also, I don’t know whether anyone polled traditional game theorists, but I’d bet that some of them would have expected it to be impossible, even with exchange of source codes, to achieve anything better than what CliqueBot does.
“Updateless” does describe the weird attitude of that decision theory to observations. It really does not learn from evidence, and it is a problem, so the weird connotations correspond to actual weirdness of the algorithm. In that sense, “telepathic” also somewhat fits, in that the current models of this style of decision making do require the players to have unreasonable amount of knowledge about each other (although merely reading each other’s thoughts is not enough), but this seems more like a limitation of current models (i.e. concrete examples of algorithms) than the limitation of the overall decision-making style (it’s not yet known to what extent it’s so).
Telepathic is even worse.
Timeless because it allows that causality can go backwards in the presence of good simulation. I think.
What’s worse about it? (I have a defense of the name I’ll hold off on sharing until I hear your opnion.)
If you bend what you mean by ‘causality,’ sure, but I don’t think there’s value in bending the word that way.
Telepathic implies some kind of supernatural communication. TDT sortof behaves like that except that there is no actual communication going on. Timeless isn’t a very good name, but telepathic has too much wierd baggage, I think. Curious to see your arguments.
Well it lets you vastly simplify the causality structures in your predictions, for one.
Huh? How could X possibly run TDT without having Y’s source code communicated to it?
My model of TDT is that, rather than looking at action-action-outcome triplets, it looks at strategy-strategy-outcome triplets. The rest of game theory remains the same, and so once you have a strategy-strategy-outcome table, you find the Nash equilibrium and you’re done. (If constructing that table fails, you revert to the Nash equilibrium from the action-action-outcome table.)
The material difference between this and regular game theory is that we now have access to strategies- i.e., we can read our opponent’s mind, i.e. telepathy. (Maybe mind-reading decision theory is a better term, but it doesn’t abbreviate to TDT like telepathic decision theory does.)
You can still run TDT in situations with time (like an iterated prisoner’s dilemma), but you can’t run TDT in situations where you don’t have your opponent’s source code. So calling it “timeless” when it can involve time seems odd, as is not referring to the necessity of source code.
This characterization/analogy doesn’t fit, and doesn’t seem to help with making the necessary-knowledge-about-opponent distinction. Knowing that performing action A implies that your opponent performs action B is a weaker statement than unconditionally knowing that your opponent performs action B. In standard game theory, you don’t know what action your opponent performs, and with TDT you don’t know that as well. But not knowing something doesn’t (automatically) make it not happen. So if there is indeed a dependence of your opponent’s action on your own action, it’s useful to know it.
The difference between considering opponent’s actions in standard game theory and considering opponent’s “strategy” (dependence of opponent’s action on your action) is that while the former is usually unknown (to both TDT and standard game theory), the latter can in principle be known, and making use of this additional potential knowledge is what distinguishes TDT. So the actions in game theory and “strategies” in TDT are not analogous.
Okay. The first looks like a strategy to me, and the second looks like an action. Right?
I agree, and that matches my characterization of TDT.
I’m not understanding this, though. Are you just saying that knowing about your opponent’s strategy gives you useful information?
How do you learn it?
The analogy is that both of them get put into a table, and then you find the Nash equilibria by altering the favored rows and columns and columns of the table, and then you pick the best of the equilibria. (TDT has known bargaining problems, right? That looks like it maps onto disagreeing over which Nash equilibrium to pick.) Would it help if I made a walkthrough of my model with actual tables?
Y doesn’t act according to the rule “Let’s see what X does. If it does A, I’m going to do B, etc.”, and so it’s misleading to call that “a strategy”. This is only something like what X infers about Y, but this is not how Y reasons, because Y can’t infer what X does, and so it can’t respond depending on what X does.
The actual strategy is to figure out an action based on the other player’s code, not to figure out an action based on the other player’s action. This strategy, which doesn’t involve responding to actions, can be characterized as establishing a dependence between players’ actions, and this characterization is instrumental to the strategy itself, a part of what makes the characterization correct.
So “play whatever I think X will play” does count as a strategy, but “play whatever X plays” does not count as a strategy because Y can’t actually implement it. Limiting X and Y to the first sort of strategies was meant to be part of my characterization, but I could have made that clearer.
It can’t implement “play whatever I think X will play” either, because it doesn’t know what X will play.
In one statement, if we are talking about ADT-like PD (the model of TDT in this post appears to be more complicated), Y could be said to choose the action such that provability of Y choosing that action implies X’s choosing a good matching action. So Y doesn’t act depending on what X does or what Y thinks X does etc., Y acts depending on what X can be inferred to do if we additionally assume that Y is doing a certain thing, and the thing we additionally assume Y to be doing is a specific action, not a strategy of responding to X’s source code, or a strategy of responding to X’s action. If you describe X’s algorithm the same way, you can see that the additional assumption of Y’s action is not what X uses in making its decision, for it similarly makes an additional assumption of its own (X’s) action and then looks what can be inferred about Y’s action (and not Y’s “strategy”).
Can you write the “cooperate iff I cooperate iff they cooperate … ” bot this way? I thought the strength of TDT was that it allowed that bot.
This can be unpacked as an algorithm that searches for a proof of the statement “If I cooperate, then my opponent also cooperates; if I defect, then my opponent also defects”, and if it finds its proof, it cooperates. Under certain conditions, two players running something like this algorithm will cooperate. As you can see, agent’s decision here depends not on the opponent’s decision, but on the opponent’s decision’s dependence on your decision (and not dependence on the dependence of your decision on the opponent’s decision, etc.).
Okay. I think that fits with my view: so long as it’s possible to go from X’s strategy and Y’s strategy to an outcome, then we can build a table of strategy-strategy-outcome triplets, and do analysis on that. (I built an example over here.) What I’m taking from this subthread is that the word “strategy” needs to have a particular meaning to be accurate, and so I need to be more careful when I use it so that it’s clear that I’m conforming to that meaning.
I usually understand “mind-reading” to encompass being aware of the current state of a system. Two systems that simply know one another’s strategies can’t predict one another’s behaviors if their strategies include random coin flips, for example, or depend on information that one system can observe but the other cannot; whereas I would expect telepathic agents to be aware of the results of such observations as well.
If you did have two telepaths playing any game, and one of them decided a mixed strategy was optimal, they wouldn’t want to know what action they were playing until it was played- because otherwise they might leak that knowledge to the other player. That is, in a competitive situation I don’t think mind-reading would extend to coin-reading, but if your understanding is common then ‘mind-reading’ is a bad phrase to use. Is there a good word for “has access to its opponent’s source code”? Bonus points if it starts with a T.
(Also, my understanding is that TDT will defect against any mixed strategy in the prisoner’s dilemma.)
Not necessarily. It will play a mixed strategy against an identical mixed strategy if that if what it needs to do to get them to play mixed rather than D. It’s the other guy being weird and arbitrary in that case, not the TDT.
Well, it certainly will defect against any mixed strategy that is hard coded into the opponent’s source code. On the other hand, if the mixed strategy the opponent plays is dependent on what it predicts the TDT agent will play, then the TDT agent will figure out which outcome has a higher expected utility:
(I defect, Opponent runs “defection predicted” mixed strategy)
(I cooperate, Opponent runs “cooperation detected” mixed strategy)
Of course, this is still simplifying things a bit, since it assumes that the opponent can perfectly predict one’s strategy, and it also rules out the possibility of the TDT agent using a mixed strategy himself.
Thus the actual computation is more like
ArgMax(Sum(ExpectedUtility(S,T)*P(T|S)))
where the argmax is over S: all possible mixed strategies for the TDT agent
the sum is over T: all possible mixed strategies for the opponent
and P(T|S) is the probability that opponent will play T, given that we choose to play S. (so this is essentially an estimate of the opponent’s predictive power.)
Won’t that let the opponent steal utility from you? Consider the case where you’re going up against another TDTer which is willing to consider both the strategy “if they cooperate only if I cooperate, then cooperate with 99% probability” and “if they cooperate only if I cooperate, then cooperate.” You want your response to the first strategy to be defection and your response to the second strategy to be cooperation, so it’s in their interests to play the second strategy.
You’re right, if the opponent is a TDT agent. I was assuming that the opponent was simply a prediction=>mixed strategy mapper. (In fact, I always thought that the strategy 51% one-box 49% two box would game the system, assuming that Omega just predicts the outcome which is most likely).
If the opponent is a TDT agent, then it becomes more complex, as in the OP. Just as above, you have to take the argmax over all possible y->x mappings, instead of simply taking the argmax over all outputs.
Putting it in that perspective, essentially in this case we are adding all possible mixed strategies to the space of possible outputs. Hmmm… That’s somewhat a better way of putting it than everything else I said.
In any case, two TDT agents will both note that the program which only cooperates 100% iff the opponent cooperates 100% dominates all other mixed strategies against such an opponent.
So to answer the original question: Yes, it will defect against blind mixed strategies. No, it will not necessarily defect against simple (prediction =>mixed strategy) mappers. N/A against another TDT agent, as neither will ever play a mixed strategy, so to ask what whether it would cooperate with a mixed strategy TDT agent is counterfactual.
EDIT: Thinking some more, I realize that TDT agents will consider the sort of 99% rigging against each other — and will find that it is better than the cooperate IFF strategy. However, this is where the “sanity check” become important. The TDT agent will realize that although such a pure agent would do better against a TDT opponent, the opponent knows that you are a TDT agent as well, and thus will not fall for the trap.
Out of this I’ve reached two conclusions:
The sanity check outlined above is not broad enough, as it only sanity checks the best agents, whereas even if the best possible agent fails the sanity check, there still could be an improvement over the nash equilibrium which passes.
Eliezer’s previous claim that a TDT agent will never regret being a TDT agent given full information is wrong (hey, I thought it was right too). Either it gives in to a pure 99% rigger or it does not. If it does, then it regrets not being able to 99% rig another TDT agent. If it does not, then it regrets not being a simple hard-coded cooperator against a 99% rigger. This probably could be formalized a bit more, but I’m wondering if Eliezer et. al. have considered this?
EDIT2: I realize I was a bit confused before. Feeling a bit stupid. Eliezer never claimed that a TDT agent won’t regret being a TDT agent (which is obviously possible, just consider a clique-bot opponent), but that a TDT agent will never regret being given information.
Incidentally, my preferred version of Newcomb is that if the Predictor decides that your chance of one-boxing is p, it puts (one million times p) dollars in the big box. Presumably, you know that the Predictor is both extremely well-calibrated and shockingly accurate (it usually winds up with p near 0 or near 1).
Yup, this is where I’m going in a future post. See the footnote on this post about other variants of TDT; there’s a balance between missing workable deals against genuinely stubborn opponents, and failing to get the best possible deal from clever but flexible opponents. (And, if I haven’t made a mistake in the reasoning I haven’t checked, there is a way to use further cleverness to do still better.)
For now, note that TDT wouldn’t necessarily prefer to be a hard-coded 99% cooperator in general, since those get “screw you” mutual defections from some (stubborn) agents that mutually cooperate with TDT.
You’ve made it into a bargaining game with that mixed strategy, and indeed the version of TDT we introduced will defect against an opponent that outputs the mixed strategy (if that opponent would output a pure cooperate if that were the only way to get its adversary to cooperate). But bargaining games are complicated, and I’m saving that for another post.
It’s a better analogy to say that TDT is capable of offering deals (and self-enforcing them) even without traditional means of communication and precommitment (as long as it has some means for inferring its opponent’s thinking), and so it has a wider range of options than CDT to optimize over.
Well, offering deals isn’t enough, right? The self-enforcing part is really important, and that’s where the Nash equilibrium idea comes in: it’s self-enforced because no party can gain by unilaterally changing their strategy (which is a somewhat different restriction than no party gaining from unilaterally changing their action).
[EDIT: I see now what Vaniver was saying, thanks to the diagram in xer reply.] Try writing it out for the Prisoner’s Dilemma in such a way that a CDT agent would cooperate with itself in the strategy-strategy game (I mean, such that they would each output a strategy that makes them both cooperate in the original PD), and you’ll see the problem. You need to eliminate logically impossible squares of the matrix, not just add new rows.
So, let’s consider this with three strategies: 1) CooperateBot, 2) DefectBot, 3) CliqueBot.
We now get a table that looks like this (notice the letters are outcomes, not actions):
X has control over which row they get, and Y has control over which column they get. The two Nash equilibria are (2,2) and (3,3). Conveniently, (2,2) and (2,3) have the same result, and so it looks like it’ll be easy to figure out that (3,3) is the better Nash equilibrium.
Aha, I see now what you mean. Good insight!
[EDIT: The following is false.] A clever CDT would be able to act like TDT if it considered, not the choice of whether to output C or D, but the choice of which mathematical object to output (because it could output a mathematical object that evaluates to C or D in a particular way depending on the code of Y—this gives it the option of genuinely acting like TDT would).
This has the interesting conclusion that even without the benefit of self-modification, a CDT agent with a good model of the world ends up acting more like TDT than traditional game theorists would expect. (Another example of this is here.) The version of CDT in the last post, contrariwise, is equipped with a very narrow model of the world and of its options. [End falsehood.]
I think these things are fascinating, but I think it’s important to show that you can get TDT behavior without incorporating anthropic reasoning, redefinition of its actions, or anything beyond a basic kind of framework that human beings know how to program.
(By the way, I wouldn’t call option 3 CliqueBot, because CliqueBots as I defined them have problems mutually cooperating with anything whose outputs aren’t identical to theirs. I think it’s better for Option 3 to be the TDT algorithm defined in the post.)
It seems to come up all the time that people aren’t aware that CDT with a sufficiently good world model (a sufficiently accurate causal graph) is the same as TDT, even though this has been known for years. If you could address that somewhere in your sequence I think you’d save a lot of people a lot of time—it’s the most common objection to standard discourse about decision theory that I’ve seen.
I’ll discuss it in the final post.
CDT leaves the money on the ground? Not unless the “sufficiently good world model” isn’t so much “sufficiently good” as it is an artificial hack that compensates for bad decision making by twisting what causal graphs are supposed to mean.
Thanks!
This is a pretty common feature of comparisons between decision theories: different outcomes generally require different assumptions.
It’s not clear to me what the difference is between the TDT algorithm in your post and the method I’ve described. You need some method of determining what the outcome pair is from strategy pair, and the inference module can (hopefully) do that. The u_f that you use is the utility of the X outcome corresponding to the best Y outcome in row f, and picking the best of those corresponds to finding the best of the Nash equilibria (in the absence of bargaining problems). The only thing I don’t mention is the sanity check, but that should just be another run of the inference module.
Sure, but does it have a short name? ProofBot?
(Notice that Y running the full TDT algorithm corresponds to there being multiple columns in the table: if you were running X against a CooperateBot, you’d just have the first column, and the Nash equilibrium would be (2,1) or (3,1). If you were running it against CliqueBot without a sanity check, there would just be the third column, and it would think (3,3) was the Nash equilibrium, but would be in for a nasty surprise when CliqueBot rejects it because of its baggage.)
If you make sure to include a sanity check, then your description should do the same thing as the TDT algorithm in the post (at least on simple games; there may be a difference in bargaining situations.)
I understand why you might feel it’s circular to name that row TDT, but nothing simpler (unless you count ADT/UDT as simpler) does as it does. It’s a layer more complicated than Newcomblike agents (which should also be included in your table); in order to get mutual cooperation with self and also defection against CooperateBot, it deduces whether a DefectBot or a MimicBot (C if it deduces Y=C, D otherwise) has a better outcome against Y, runs a sanity check, and if that goes through it does what the preferred strategy does.
After further review, I was wrong that CDT would be capable of making use of this to act like TDT. If CDT treats its output as separate from the rest of the causal graph (in the sense of the previous post), then it would still prefer to output an always-defect rather than a Löbian mathematical object. So it does take a different kind of agent to think of Nash equilibria among strategies.
Also, the combinatorics of enumerating strategies and looking for Nash equilibria are kind of awful: there are 16 different inputs that such a strategy has to deal with (i.e. what the opponent does against CooperateBot, DefectBot, NewcombBot and AntiNewcombBot), so there are 2^16 variant strategies in the same class. The one we call TDT is the Nash equilibrium, but it would take a long while to establish that in a naive implementation.
When its source code is provided to its opponent, how could it be sensible to treat its output as separate from the rest of the causal graph?
Sure, but it’s just as bad for the algorithm you wrote: you attempt to deduce output Y(code G, code A_i) for all A_i, which is exactly what you need to determine the Nash Equilibrium of this table. (Building the table isn’t much extra work, if it even requires more, and is done here more for illustrative than computational purposes.)
I am really worried that those two objections were enough to flip your position on this.
No, there are four different A_i (CooperateBot, DefectBot, NewcombBot and AntiNewcombBot). 2^16 is the number of distinct agents one could write that see what Y does against the A_i and picks an action based on those responses. Just taking the maximum each time saves you from enumerating 2^16 strategies.
That is what CDT is. “Sensible” doesn’t enter into it.
(To expand on this, CDT’s way of avoiding harmful self-reference is to treat its decision as a causally separate node and try out different values for it while changing nothing else on the graph, including things that are copies of its source code. So it considers it legitimate to figure out the impact of its present decision on any agent who can see the effects of the action, but not on any agent who can predict the decision. Don’t complain to me, I didn’t make this up.)
It’s not clear to me that’s the case. If your bot and my bot both receive the same source code for Y, we both determine the correct number of potential sub-strategies Y can use, and have to evaluate each of them against each of our A_is. I make the maximization over all of Y’s substrategies explicit by storing all of the values I obtain, but in order to get the maximum you also have to calculate all possible values. (I suppose you could get a bit of computational savings by exploiting the structure of the problem, but that may not generalize to arbitrary games.)
The basic decision here is what to write as the source code, not the action that our bot outputs, and so CDT is fine- if it modifies the source code for X, that can impact the outputs of both X and Y. There’s no way to modify the output of X without potentially modifying the output of Y in this game and I don’t see a reason for CDT to mistakenly hallucinate one.
Put another way, I don’t think I would use “causally separate”- I think I would use “unprecedented.” The influence diagram I’m drawing for this has three decision boxes (made by three different agents), all unprecedented, whose outputs are the code for X, Y, and G; all of them point to the calculation node of the inference module, and then all three codes and the inference module point to separate calculation nodes of X’s output and Y’s output, which then both point to the value node of Game Outcome. (You could have uncertainty nodes pointing to X’s output and Y’s output separately, but I’m ignoring mixed strategies for now.)
To the best of my knowledge, this isn’t a feature of CDT: it’s a feature of the embedded physics module used by most CDTers. If we altered Newcomb’s problem such that Omega filled the boxes after the agent made their choice, then CDTers would one-box- and so if you have a CDTer who believes that perfect prediction is equivalent to information moving backwards in time (and that that’s possible), then you have a one-boxing CDTer.
To be a trustworthy tool rather than a potential trap, the source code to Y has to be completely accurate and has to have final decision making authority. Y’s programmer has to be able to accurately say “Here is enough information to predict any decision that my irrevokably delegated representative would ever possibly make in this interaction.” Saying that this is “without traditional means of communication” is technically accurate but very deceptive. Saying that this is “no actual communication” is outright backwards; if anything it’s much more communication than is traditionally imagined.
Unlimited communication, even. In the hypothetical “AGIs can inspect each others’ source code” case, the AGIs could just as well run that source code and have two separate conversations, one between each original and its counterpart’s copy. If the AGIs’ senses of ethics were sufficiently situation-dependent, then to generate useful proofs they’d need to be able to inspect copies of each others’ current state as well, in which case the two separate conversations might well be identical.
This is all true, but you can come up with situations where exchanging source code is more relevant. For instance, Robin Hanson has frequently argued that agents will diverge from each other as they explore the universe, and that someone will start burning the cosmic commons. This is a Prisoner’s Dilemma without traditional communication (since signals are limited by lightspeed, it would be too late to stop someone distant from defecting once you see they’ve started). But the “exchange of source code” kind of coordination is more feasible.
Also, I don’t know whether anyone polled traditional game theorists, but I’d bet that some of them would have expected it to be impossible, even with exchange of source codes, to achieve anything better than what CliqueBot does.
You’re right, and in my humble opinion ‘Updateless’ is only slightly better.
“Updateless” does describe the weird attitude of that decision theory to observations. It really does not learn from evidence, and it is a problem, so the weird connotations correspond to actual weirdness of the algorithm. In that sense, “telepathic” also somewhat fits, in that the current models of this style of decision making do require the players to have unreasonable amount of knowledge about each other (although merely reading each other’s thoughts is not enough), but this seems more like a limitation of current models (i.e. concrete examples of algorithms) than the limitation of the overall decision-making style (it’s not yet known to what extent it’s so).
Yes, just not as well as I would like.