whereas my own reasoning assumes the D-equilibrium already globally accomplished, but suspects that in this case rational agents have a strong incentive to reach up to the largest reachable C-equilibria, which they can accomplish by increasing (not decreasing) various forms of entanglement.
Ok, this looks reasonable to me. But how would they actually go about doing this? So far I can see two general methods:
convergence towards an “obvious” decision theory
deliberate conditioning of moves between players
My current view is that neither of these methods seem very powerful as mechanisms for enabling cooperation, compared to say the ability to prove source code, or to merge securely. To summarize my thoughts and the various examples I’ve given, here are the problems with each of the above methods for “increasing entanglement”:
Two agents with the same “obvious” decision theory may not be highly correlated, if they have different heuristics, intuitions, priors, utility functions, etc. Also, an agent may have a disincentive to unilaterally increase his correlation with a large group of already highly correlated agents.
Deliberate conditioning of moves is difficult when two sides have high uncertainty about each others’ source code. Which hypothetical agent(s) do you condition your move against? How would they know that you’ve done so, when they don’t know your source code either? It’s also difficult if two sides have different preferences about the correlation of their moves, that is, if one side wants them to be positively correlated, and another wants them to be uncorrelated or negatively correlated.
These sound like basically reasonable worries / lines of argument to me. I’m sure life will be a lot easier for… not necessarily everyone, but at least us primitive mortal analysts… if it’s easy for superintelligences to exhibit their source code to each other. Then we just have the problem of logical ordering in threats and games of Chicken. (Come to think of it, blackmail threats of mutual destruction unless paid off, would seem to become more probable, not less, as you became more able to exhibit and prove your source code to the other player.)
A possible primary remaining source of our differing guesses at this point, may have to do with the degree to which we think that decision processes are a priori (un)correlated. I take statements like “Obviously, everyone plays D at the end” to be evidence of very high a priori correlation—it’s no good talking about different heuristics, intuitions, priors, utility functions, etcetera, if you don’t actually conclude that maybe some players play C and others play D.
It’s also difficult if two sides have different preferences about the correlation of their moves, that is, if one side wants them to be positively correlated, and another wants them to be uncorrelated or negatively correlated.
(Come to think of it, blackmail threats of mutual destruction unless paid off, would seem to become more probable, not less, as you became more able to exhibit and prove your source code to the other player.)
I think Nesov’s position is that such threats don’t work against updateless agents, but I’m not sure about that yet. ETA: See previous discussion of this topic.
I take statements like “Obviously, everyone plays D at the end” to be evidence of very high a priori correlation—it’s no good talking about different heuristics, intuitions, priors, utility functions, etcetera, if you don’t actually conclude that maybe some players play C and others play D.
That doesn’t make sense… Suppose nobody smokes, and nobody gets cancer. Does that mean smoking and cancer are correlated? In order to have correlation, you need to have both (C,C) and (D,D) outcomes. If all you have are (D,D) outcomes, there is no correlation.
How would that happen?
I’m referring to rock-paper-scissors and this example. Or were you asking something else?
Ok, this looks reasonable to me. But how would they actually go about doing this? So far I can see two general methods:
convergence towards an “obvious” decision theory
deliberate conditioning of moves between players
My current view is that neither of these methods seem very powerful as mechanisms for enabling cooperation, compared to say the ability to prove source code, or to merge securely. To summarize my thoughts and the various examples I’ve given, here are the problems with each of the above methods for “increasing entanglement”:
Two agents with the same “obvious” decision theory may not be highly correlated, if they have different heuristics, intuitions, priors, utility functions, etc. Also, an agent may have a disincentive to unilaterally increase his correlation with a large group of already highly correlated agents.
Deliberate conditioning of moves is difficult when two sides have high uncertainty about each others’ source code. Which hypothetical agent(s) do you condition your move against? How would they know that you’ve done so, when they don’t know your source code either? It’s also difficult if two sides have different preferences about the correlation of their moves, that is, if one side wants them to be positively correlated, and another wants them to be uncorrelated or negatively correlated.
These sound like basically reasonable worries / lines of argument to me. I’m sure life will be a lot easier for… not necessarily everyone, but at least us primitive mortal analysts… if it’s easy for superintelligences to exhibit their source code to each other. Then we just have the problem of logical ordering in threats and games of Chicken. (Come to think of it, blackmail threats of mutual destruction unless paid off, would seem to become more probable, not less, as you became more able to exhibit and prove your source code to the other player.)
A possible primary remaining source of our differing guesses at this point, may have to do with the degree to which we think that decision processes are a priori (un)correlated. I take statements like “Obviously, everyone plays D at the end” to be evidence of very high a priori correlation—it’s no good talking about different heuristics, intuitions, priors, utility functions, etcetera, if you don’t actually conclude that maybe some players play C and others play D.
How would that happen?
I think Nesov’s position is that such threats don’t work against updateless agents, but I’m not sure about that yet. ETA: See previous discussion of this topic.
That doesn’t make sense… Suppose nobody smokes, and nobody gets cancer. Does that mean smoking and cancer are correlated? In order to have correlation, you need to have both (C,C) and (D,D) outcomes. If all you have are (D,D) outcomes, there is no correlation.
I’m referring to rock-paper-scissors and this example. Or were you asking something else?