(Also, a point that’s overdue getting into the water supply is that you don’t need to be an ASI to use this and there is no need to prove theorems about your counterparty, you just need to submit legible programs (or formal company bylaws) that will negotiate with each other, being able to reason about behavior of each other, not about behavior of their possibly inscrutable principals. There’s some discussion of that in the third paper I linked above.
The problem with this framing is that legitimacy of a negotiation is in question, as you still need to know something about the principals or incentives that act on them to expect them to respect the verdict of the negotiation performed by the programs they submit. But this point is separate from what makes Prisoner’s Dilemma in particular hard to solve, that aspect is taken care of by replacing constant Cooperate/Defect actions with programs that compute those actions based on static analysis of (reasoning about) the other programs involved in the negotiation.)
Thank you for providing those resources. They weren’t quite what I was hoping to see, but they did help me see that I did not correctly describe what I was looking for.
Specifically, if we use the first paper’s definition that “adversarially robust” means “inexploitable—i.e. the agent will never cooperate with something that would defect against it, but may defect even if cooperating would lead to a C/C outcome and defecting would lead to D/D”, one example of “an adversarially robust decision theory which does not require infinite compute” is “DefectBot” (which, in the language of the third paper, is a special case of Defect-Unless-Proof-Of-Cooperation-bot (DUPOC(0))).
What I actually want is an example of a concrete system that is
Inexploitable (or nearly so): This system will never (or rarely) play C against something that will play D against it.
Competitive: There is no other strategy which can, in certain environments, get long-term better outcomes than this strategy by sacrificing inexploitability-in-theory for performance-in-its-actual-environment-in-practice (for example, I note that in the prisoner’s dilemma tournament back in 2013, the actual winner was a RandomBot despite some attempts to enter FairBot and friends, though also a lot of the bots in that tournament had Problems)
Computationally tractable.
Ideally, it would also be
Robust to the agents making different predictions about the effects of their actions. I honestly don’t know what a solution to that problem would look like, even in theory, but “able to operate effectively in a world where not all effects of your actions are known in advance” seems like an important thing for a decision theory.
Robust to the “trusting trust” problem (i.e. the issue of “how do you know that the source code you received is what the other agent is actually running”). Though if you have a solution for this problem you might not even need a solution to a lot of the other problems, because a solution to this problem implies an extremely powerful already-existing coordination mechanism (e.g. “all manufactured hardware has preloaded spyware from some trusted third party that lives in a secure enclave and can make a verifiable signed report of the exact contents of the memory and storage of that computer”).
In any case, it may be time to run another PD tournament. Perhaps this time with strategies described in English and “evaluated” by an LLM, since “write a program that does the thing you want” seems to have been the blocking step for things people wanted to do in previous submissions.
Edit: I would be very curious to hear from the person who strong-disagreed with this about what, specifically, their disagreement is? I presume that the disagreement is not with my statement that I could have phrased my first comment better, but it could plausibly be any of “the set of desired characteristics is not a useful one”, “no, actually, we don’t need another PD tournament”, or “We should have another PD tournament, but having the strategies be written in English and executed by asking an LLM what the policy does is a terrible idea”.
Robust to the “trusting trust” problem (i.e. the issue of “how do you know that the source code you received is what the other agent is actually running”). ″
This is the crux really, and I’m surprised that many LW’s seem to believe the ‘robust cooperation’ research actually works sans a practical solution to ‘trusting trust’ (which I suspect doesn’t actually exist), but in that sense it’s in good company (diamonoid nanotech, rapid takeoff, etc)
It’s written up in Robust Cooperation in the Prisoner’s Dilemma and Parametric Bounded Löb’s Theorem and Robust Cooperation of Bounded Agents (which is about making this work without infinite compute), with more discussion of practical-ish application in Cooperative and uncooperative institution designs: Surprises and problems in open-source game theory.
(Also, a point that’s overdue getting into the water supply is that you don’t need to be an ASI to use this and there is no need to prove theorems about your counterparty, you just need to submit legible programs (or formal company bylaws) that will negotiate with each other, being able to reason about behavior of each other, not about behavior of their possibly inscrutable principals. There’s some discussion of that in the third paper I linked above.
The problem with this framing is that legitimacy of a negotiation is in question, as you still need to know something about the principals or incentives that act on them to expect them to respect the verdict of the negotiation performed by the programs they submit. But this point is separate from what makes Prisoner’s Dilemma in particular hard to solve, that aspect is taken care of by replacing constant Cooperate/Defect actions with programs that compute those actions based on static analysis of (reasoning about) the other programs involved in the negotiation.)
Thank you for providing those resources. They weren’t quite what I was hoping to see, but they did help me see that I did not correctly describe what I was looking for.
Specifically, if we use the first paper’s definition that “adversarially robust” means “inexploitable—i.e. the agent will never cooperate with something that would defect against it, but may defect even if cooperating would lead to a C/C outcome and defecting would lead to D/D”, one example of “an adversarially robust decision theory which does not require infinite compute” is “DefectBot” (which, in the language of the third paper, is a special case of Defect-Unless-Proof-Of-Cooperation-bot (DUPOC(0))).
What I actually want is an example of a concrete system that is
Inexploitable (or nearly so): This system will never (or rarely) play C against something that will play D against it.
Competitive: There is no other strategy which can, in certain environments, get long-term better outcomes than this strategy by sacrificing inexploitability-in-theory for performance-in-its-actual-environment-in-practice (for example, I note that in the prisoner’s dilemma tournament back in 2013, the actual winner was a RandomBot despite some attempts to enter FairBot and friends, though also a lot of the bots in that tournament had Problems)
Computationally tractable.
Ideally, it would also be
Robust to the agents making different predictions about the effects of their actions. I honestly don’t know what a solution to that problem would look like, even in theory, but “able to operate effectively in a world where not all effects of your actions are known in advance” seems like an important thing for a decision theory.
Robust to the “trusting trust” problem (i.e. the issue of “how do you know that the source code you received is what the other agent is actually running”). Though if you have a solution for this problem you might not even need a solution to a lot of the other problems, because a solution to this problem implies an extremely powerful already-existing coordination mechanism (e.g. “all manufactured hardware has preloaded spyware from some trusted third party that lives in a secure enclave and can make a verifiable signed report of the exact contents of the memory and storage of that computer”).
In any case, it may be time to run another PD tournament. Perhaps this time with strategies described in English and “evaluated” by an LLM, since “write a program that does the thing you want” seems to have been the blocking step for things people wanted to do in previous submissions.
Edit: I would be very curious to hear from the person who strong-disagreed with this about what, specifically, their disagreement is? I presume that the disagreement is not with my statement that I could have phrased my first comment better, but it could plausibly be any of “the set of desired characteristics is not a useful one”, “no, actually, we don’t need another PD tournament”, or “We should have another PD tournament, but having the strategies be written in English and executed by asking an LLM what the policy does is a terrible idea”.
This is the crux really, and I’m surprised that many LW’s seem to believe the ‘robust cooperation’ research actually works sans a practical solution to ‘trusting trust’ (which I suspect doesn’t actually exist), but in that sense it’s in good company (diamonoid nanotech, rapid takeoff, etc)