Daniel Kokotajlo comments on Predictors exist: CDT going bonkers… forever

Daniel Kokotajlo 14 Jan 2020 21:35 UTC
LW: 5 AF: 3
AF
Well said.
I had a similar idea a while ago and am working it up into a paper (“CDT Agents are Exploitable”). Caspar Oesterheld and Vince Conitzer are also doing something like this. And then there is Ahmed’s Betting on the Past case.
In their version, the Predictor offers bets to the agent, at least one of which the agent will accept (for the reasons you outline) and thus they get money-pumped. In my version, there is no Predictor, but instead there are several very similar CDT agents, and a clever human bookie can extract money from them by exploiting their inability to coordinate.
Long story short, I would bet that an actual AGI which was otherwise smarter than me but which doggedly persisted in doing its best to approximate CDT would fail spectacularly one way or another, “hacked” by some clever bookie somewhere (possibly in its hypothesis space only!). Unfortunately, arguably the same is true for all decision theories I’ve seen so far, but for different reasons...
- Caspar Oesterheld 27 Jan 2020 15:50 UTC
  LW: 3 AF: 2
  AF Parent
  >Caspar Oesterheld and Vince Conitzer are also doing something like this
  That paper can be found at https://users.cs.duke.edu/~ocaspar/CDTMoneyPump.pdf . And yes, it is structurally essentially the same as the problem in the post.
  - Stuart_Armstrong 27 Jan 2020 16:04 UTC
    LW: 3 AF: 2
    AF Parent
    Cool!
    
    I notice that you assumed there were no independent randomising devices available. But why would the CDT agent ever opt to use a randomising device? Why would it see that as having value?
    - Caspar Oesterheld 28 Jan 2021 17:05 UTC
      LW: 1 AF: 1
      AF Parent
      Apologies, I only saw your comment just now! Yes, I agree, CDT never strictly prefers randomizing. So there are agents who abide by CDT and never randomize. As our scenarios show, these agents are exploitable. However, there could also be CDT agents who, when indifferent between some set of actions (and when randomization is not associated with any cost), do randomize (and choose the probability according to some additional theory—for example, you could have the decision procedure: “follow CDT, but when indifferent between multiple actions, choose a distribution over these actions that is ratifiable”.). The updated version of our paper—which has now been published Open Access in The Philosophical Quarterly—actually contains some extra discussion of this in Section IV.1, starting with the paragraph “Nonetheless, what happens if we grant the buyer in Adversarial Offer access to a randomisation device...”.