I gave one example earlier of TDT agents not playing cooperate in PD against each other. Here’s another, perhaps even more puzzling, example.
Consider 3 TDT agents, A, B, and C, playing a game of 3-choose-2 PD. These agents are identical, except that they have different beliefs about how they are logically related to each other. A and B both believe that A and B are 100% logically correlated (in other words, logically equivalent). A and C both believe that A and C are 0% logically correlated. B and C also believe that B and C are 0% logically correlated.
What’s the outcome of this game? Well, C should clearly play defect, since it’s sure that it’s not correlated with either of the other players. A and B both play cooperate, since that maximizes expected utility given that they are correlated with each other but not with C (the arithmetic is the same as in my earlier 3-choose-2 PD example). Given this outcome, their initial beliefs about their logical relationships don’t seem to be inconsistent.
How do they end up in this situation? Clearly they cannot all have common knowledge of each other’s source code, so where do they obtain their definite beliefs about each other instead?
Re: “definite beliefs”, the numbers don’t have to be 100% and 0%. They could be any p and q, where p is above the threshold for cooperation, and q is below.
As for where the numbers come from, I don’t know. Perhaps the players have different initial intuitions (from a mathematical intuition module provided by evolution or their programmers) about their logical correlations, which causes them to actually have different logical correlations (since they are actually computing different things when making decisions), which then makes those intuitions consistent upon reflection.
Why can’t A and B choose to be correlated with C by deliberately making their decision dependent on its decision? Insufficient knowledge of C’s code even to make their decision dependent on “what an agent does when it thinks it’s not correlated to you”? In other words, you know that C is going to follow a certain decision algorithm here—do the Dai-obvious thing and defect—but A and B don’t know enough about C to defect conditional on the “obvious” thing being to defect?
Why can’t A and B choose to be correlated with C by deliberately making their decision dependent on its decision?
A and B don’t choose this, because given their beliefs (i.e., low correlation between A and C, and B and C), that doesn’t maximize their expected utilities. So the belief is like a self-fulling prophecy. Intuitively, you might think “Why don’t they get out of this trap by choosing to be correlated with C and simultaneously change their beliefs?” The problem is that they don’t think this will work, because they think C wouldn’t respond to this.
In other words, why would A and B defect conditional on C defecting, when they know “C is going to follow a certain decision algorithm here—do the Dai-obvious thing and defect”?
Anyway, that’s what I think happens under UDT1. It’s quite possible (almost certain, really) that UDT1 is wrong or incomplete. But if you have a better solution, can you try to formalize it, and not just make informal arguments? Or, if you think you have an intuitively satisfactory solution that you don’t know how to formalize yet, I’ll stop beating this dead horse and let you work it out.
I don’t have a general solution. I’m just carrying out the reasoning by hand. I don’t know how to solve the logical ordering problem.
In other words, why would A and B defect conditional on C defecting, when they know “C is going to follow a certain decision algorithm here—do the Dai-obvious thing and defect”?
Why would C choose to follow such an algorithm, if C perceives that not following such an algorithm might lead to mutual cooperation instead of mutual defection?
Essentially, I’m claiming that your belief about “logical uncorrelation” is hard to match up with your out-of-context intuitive reasoning about what all the parties are likely to do. It’s another matter if C is a piece of cardboard, a random number generator, or a biological organism operating on some weird deluded decision theory; but you’re reasoning as if C is calmly maximizing.
Suppose I put things to you this way: Groups of superrational agents will not occupy anything that is not at least a Pareto optimum, because they have strong motives to occupy Pareto optima and TDT lets them coordinate where such motives exist. Now the 3-choose-2 problem with two C players and one D player may be a Pareto optimum (if taken at face value without further trades being possible), but if you think of Pareto-ization as an underlying motivation—that everyone starts out in the mutual defection state, and then has a motive to figure out how to leave the mutual defection state by increasing their entanglement—then you might see why I’m a bit more skeptical about these “logical uncorrelations”. Then you just end up in the all-D state, the base state, and agents have strong incentives to figure out ways to leave it if they can.
In other words, you seem to be thinking in terms of a C-equilibrium already accomplished among one group of agents locally correlated with themselves only, and looking at the incentive of other agents to locally-D; whereas my own reasoning assumes the D-equilibrium already globally accomplished, but suspects that in this case rational agents have a strong incentive to reach up to the largest reachable C-equilibria, which they can accomplish by increasing (not decreasing) various forms of entanglement.
Relations between “previously uncorrelated” groups may be viewable as analogous to relations between causally uncorrelated individuals. To assume that one subgroup has decided on interior cooperation even though it makes them vulnerable to outside defection, without that subgroup having demanded anything in return, may be like presuming unilateral cooperation on the PD.
whereas my own reasoning assumes the D-equilibrium already globally accomplished, but suspects that in this case rational agents have a strong incentive to reach up to the largest reachable C-equilibria, which they can accomplish by increasing (not decreasing) various forms of entanglement.
Ok, this looks reasonable to me. But how would they actually go about doing this? So far I can see two general methods:
convergence towards an “obvious” decision theory
deliberate conditioning of moves between players
My current view is that neither of these methods seem very powerful as mechanisms for enabling cooperation, compared to say the ability to prove source code, or to merge securely. To summarize my thoughts and the various examples I’ve given, here are the problems with each of the above methods for “increasing entanglement”:
Two agents with the same “obvious” decision theory may not be highly correlated, if they have different heuristics, intuitions, priors, utility functions, etc. Also, an agent may have a disincentive to unilaterally increase his correlation with a large group of already highly correlated agents.
Deliberate conditioning of moves is difficult when two sides have high uncertainty about each others’ source code. Which hypothetical agent(s) do you condition your move against? How would they know that you’ve done so, when they don’t know your source code either? It’s also difficult if two sides have different preferences about the correlation of their moves, that is, if one side wants them to be positively correlated, and another wants them to be uncorrelated or negatively correlated.
These sound like basically reasonable worries / lines of argument to me. I’m sure life will be a lot easier for… not necessarily everyone, but at least us primitive mortal analysts… if it’s easy for superintelligences to exhibit their source code to each other. Then we just have the problem of logical ordering in threats and games of Chicken. (Come to think of it, blackmail threats of mutual destruction unless paid off, would seem to become more probable, not less, as you became more able to exhibit and prove your source code to the other player.)
A possible primary remaining source of our differing guesses at this point, may have to do with the degree to which we think that decision processes are a priori (un)correlated. I take statements like “Obviously, everyone plays D at the end” to be evidence of very high a priori correlation—it’s no good talking about different heuristics, intuitions, priors, utility functions, etcetera, if you don’t actually conclude that maybe some players play C and others play D.
It’s also difficult if two sides have different preferences about the correlation of their moves, that is, if one side wants them to be positively correlated, and another wants them to be uncorrelated or negatively correlated.
(Come to think of it, blackmail threats of mutual destruction unless paid off, would seem to become more probable, not less, as you became more able to exhibit and prove your source code to the other player.)
I think Nesov’s position is that such threats don’t work against updateless agents, but I’m not sure about that yet. ETA: See previous discussion of this topic.
I take statements like “Obviously, everyone plays D at the end” to be evidence of very high a priori correlation—it’s no good talking about different heuristics, intuitions, priors, utility functions, etcetera, if you don’t actually conclude that maybe some players play C and others play D.
That doesn’t make sense… Suppose nobody smokes, and nobody gets cancer. Does that mean smoking and cancer are correlated? In order to have correlation, you need to have both (C,C) and (D,D) outcomes. If all you have are (D,D) outcomes, there is no correlation.
How would that happen?
I’m referring to rock-paper-scissors and this example. Or were you asking something else?
I gave one example earlier of TDT agents not playing cooperate in PD against each other. Here’s another, perhaps even more puzzling, example.
Consider 3 TDT agents, A, B, and C, playing a game of 3-choose-2 PD. These agents are identical, except that they have different beliefs about how they are logically related to each other. A and B both believe that A and B are 100% logically correlated (in other words, logically equivalent). A and C both believe that A and C are 0% logically correlated. B and C also believe that B and C are 0% logically correlated.
What’s the outcome of this game? Well, C should clearly play defect, since it’s sure that it’s not correlated with either of the other players. A and B both play cooperate, since that maximizes expected utility given that they are correlated with each other but not with C (the arithmetic is the same as in my earlier 3-choose-2 PD example). Given this outcome, their initial beliefs about their logical relationships don’t seem to be inconsistent.
How do they end up in this situation? Clearly they cannot all have common knowledge of each other’s source code, so where do they obtain their definite beliefs about each other instead?
Re: “definite beliefs”, the numbers don’t have to be 100% and 0%. They could be any p and q, where p is above the threshold for cooperation, and q is below.
As for where the numbers come from, I don’t know. Perhaps the players have different initial intuitions (from a mathematical intuition module provided by evolution or their programmers) about their logical correlations, which causes them to actually have different logical correlations (since they are actually computing different things when making decisions), which then makes those intuitions consistent upon reflection.
Why can’t A and B choose to be correlated with C by deliberately making their decision dependent on its decision? Insufficient knowledge of C’s code even to make their decision dependent on “what an agent does when it thinks it’s not correlated to you”? In other words, you know that C is going to follow a certain decision algorithm here—do the Dai-obvious thing and defect—but A and B don’t know enough about C to defect conditional on the “obvious” thing being to defect?
A and B don’t choose this, because given their beliefs (i.e., low correlation between A and C, and B and C), that doesn’t maximize their expected utilities. So the belief is like a self-fulling prophecy. Intuitively, you might think “Why don’t they get out of this trap by choosing to be correlated with C and simultaneously change their beliefs?” The problem is that they don’t think this will work, because they think C wouldn’t respond to this.
In other words, why would A and B defect conditional on C defecting, when they know “C is going to follow a certain decision algorithm here—do the Dai-obvious thing and defect”?
Anyway, that’s what I think happens under UDT1. It’s quite possible (almost certain, really) that UDT1 is wrong or incomplete. But if you have a better solution, can you try to formalize it, and not just make informal arguments? Or, if you think you have an intuitively satisfactory solution that you don’t know how to formalize yet, I’ll stop beating this dead horse and let you work it out.
I don’t have a general solution. I’m just carrying out the reasoning by hand. I don’t know how to solve the logical ordering problem.
Why would C choose to follow such an algorithm, if C perceives that not following such an algorithm might lead to mutual cooperation instead of mutual defection?
Essentially, I’m claiming that your belief about “logical uncorrelation” is hard to match up with your out-of-context intuitive reasoning about what all the parties are likely to do. It’s another matter if C is a piece of cardboard, a random number generator, or a biological organism operating on some weird deluded decision theory; but you’re reasoning as if C is calmly maximizing.
Suppose I put things to you this way: Groups of superrational agents will not occupy anything that is not at least a Pareto optimum, because they have strong motives to occupy Pareto optima and TDT lets them coordinate where such motives exist. Now the 3-choose-2 problem with two C players and one D player may be a Pareto optimum (if taken at face value without further trades being possible), but if you think of Pareto-ization as an underlying motivation—that everyone starts out in the mutual defection state, and then has a motive to figure out how to leave the mutual defection state by increasing their entanglement—then you might see why I’m a bit more skeptical about these “logical uncorrelations”. Then you just end up in the all-D state, the base state, and agents have strong incentives to figure out ways to leave it if they can.
In other words, you seem to be thinking in terms of a C-equilibrium already accomplished among one group of agents locally correlated with themselves only, and looking at the incentive of other agents to locally-D; whereas my own reasoning assumes the D-equilibrium already globally accomplished, but suspects that in this case rational agents have a strong incentive to reach up to the largest reachable C-equilibria, which they can accomplish by increasing (not decreasing) various forms of entanglement.
Relations between “previously uncorrelated” groups may be viewable as analogous to relations between causally uncorrelated individuals. To assume that one subgroup has decided on interior cooperation even though it makes them vulnerable to outside defection, without that subgroup having demanded anything in return, may be like presuming unilateral cooperation on the PD.
Ok, this looks reasonable to me. But how would they actually go about doing this? So far I can see two general methods:
convergence towards an “obvious” decision theory
deliberate conditioning of moves between players
My current view is that neither of these methods seem very powerful as mechanisms for enabling cooperation, compared to say the ability to prove source code, or to merge securely. To summarize my thoughts and the various examples I’ve given, here are the problems with each of the above methods for “increasing entanglement”:
Two agents with the same “obvious” decision theory may not be highly correlated, if they have different heuristics, intuitions, priors, utility functions, etc. Also, an agent may have a disincentive to unilaterally increase his correlation with a large group of already highly correlated agents.
Deliberate conditioning of moves is difficult when two sides have high uncertainty about each others’ source code. Which hypothetical agent(s) do you condition your move against? How would they know that you’ve done so, when they don’t know your source code either? It’s also difficult if two sides have different preferences about the correlation of their moves, that is, if one side wants them to be positively correlated, and another wants them to be uncorrelated or negatively correlated.
These sound like basically reasonable worries / lines of argument to me. I’m sure life will be a lot easier for… not necessarily everyone, but at least us primitive mortal analysts… if it’s easy for superintelligences to exhibit their source code to each other. Then we just have the problem of logical ordering in threats and games of Chicken. (Come to think of it, blackmail threats of mutual destruction unless paid off, would seem to become more probable, not less, as you became more able to exhibit and prove your source code to the other player.)
A possible primary remaining source of our differing guesses at this point, may have to do with the degree to which we think that decision processes are a priori (un)correlated. I take statements like “Obviously, everyone plays D at the end” to be evidence of very high a priori correlation—it’s no good talking about different heuristics, intuitions, priors, utility functions, etcetera, if you don’t actually conclude that maybe some players play C and others play D.
How would that happen?
I think Nesov’s position is that such threats don’t work against updateless agents, but I’m not sure about that yet. ETA: See previous discussion of this topic.
That doesn’t make sense… Suppose nobody smokes, and nobody gets cancer. Does that mean smoking and cancer are correlated? In order to have correlation, you need to have both (C,C) and (D,D) outcomes. If all you have are (D,D) outcomes, there is no correlation.
I’m referring to rock-paper-scissors and this example. Or were you asking something else?