I had a similar idea a while ago and am working it up into a paper (“CDT Agents are Exploitable”). Caspar Oesterheld and Vince Conitzer are also doing something like this. And then there is Ahmed’s Betting on the Past case.
In their version, the Predictor offers bets to the agent, at least one of which the agent will accept (for the reasons you outline) and thus they get money-pumped. In my version, there is no Predictor, but instead there are several very similar CDT agents, and a clever human bookie can extract money from them by exploiting their inability to coordinate.
Long story short, I would bet that an actual AGI which was otherwise smarter than me but which doggedly persisted in doing its best to approximate CDT would fail spectacularly one way or another, “hacked” by some clever bookie somewhere (possibly in its hypothesis space only!). Unfortunately, arguably the same is true for all decision theories I’ve seen so far, but for different reasons...
I notice that you assumed there were no independent randomising devices available. But why would the CDT agent ever opt to use a randomising device? Why would it see that as having value?
Apologies, I only saw your comment just now! Yes, I agree, CDT never strictly prefers randomizing. So there are agents who abide by CDT and never randomize. As our scenarios show, these agents are exploitable. However, there could also be CDT agents who, when indifferent between some set of actions (and when randomization is not associated with any cost), do randomize (and choose the probability according to some additional theory—for example, you could have the decision procedure: “follow CDT, but when indifferent between multiple actions, choose a distribution over these actions that is ratifiable”.). The updated version of our paper—which has now been published Open Access in The Philosophical Quarterly—actually contains some extra discussion of this in Section IV.1, starting with the paragraph “Nonetheless, what happens if we grant the buyer in Adversarial Offer access to a randomisation device...”.
Well said.
I had a similar idea a while ago and am working it up into a paper (“CDT Agents are Exploitable”). Caspar Oesterheld and Vince Conitzer are also doing something like this. And then there is Ahmed’s Betting on the Past case.
In their version, the Predictor offers bets to the agent, at least one of which the agent will accept (for the reasons you outline) and thus they get money-pumped. In my version, there is no Predictor, but instead there are several very similar CDT agents, and a clever human bookie can extract money from them by exploiting their inability to coordinate.
Long story short, I would bet that an actual AGI which was otherwise smarter than me but which doggedly persisted in doing its best to approximate CDT would fail spectacularly one way or another, “hacked” by some clever bookie somewhere (possibly in its hypothesis space only!). Unfortunately, arguably the same is true for all decision theories I’ve seen so far, but for different reasons...
>Caspar Oesterheld and Vince Conitzer are also doing something like this
That paper can be found at https://users.cs.duke.edu/~ocaspar/CDTMoneyPump.pdf . And yes, it is structurally essentially the same as the problem in the post.
Cool!
I notice that you assumed there were no independent randomising devices available. But why would the CDT agent ever opt to use a randomising device? Why would it see that as having value?
Apologies, I only saw your comment just now! Yes, I agree, CDT never strictly prefers randomizing. So there are agents who abide by CDT and never randomize. As our scenarios show, these agents are exploitable. However, there could also be CDT agents who, when indifferent between some set of actions (and when randomization is not associated with any cost), do randomize (and choose the probability according to some additional theory—for example, you could have the decision procedure: “follow CDT, but when indifferent between multiple actions, choose a distribution over these actions that is ratifiable”.). The updated version of our paper—which has now been published Open Access in The Philosophical Quarterly—actually contains some extra discussion of this in Section IV.1, starting with the paragraph “Nonetheless, what happens if we grant the buyer in Adversarial Offer access to a randomisation device...”.