habryka comments on Decision theory does not imply that we get to have nice things

habryka 29 Jul 2024 1:46 UTC
LW: 4 AF: 3
2
AF
In the link I didn’t see anything that suggests that Eliezer analogized creating ASI with a prisoner’s dilemma (though I might have missed it), so my objection here is mostly to analogizing the creation of ASI to a prisoner’s dilemma like this.
The reason why it is disanalogous is because humanity has no ability to make our strategy conditional on the strategy of our opponent. The core reason why TDT/LDT agents would cooperate in a prisoner’s dilemma is because they can model their opponent and make their strategy conditional on their opponent’s strategy in a way that enables coordination. We currently seem to have no ability to choose whether we create ASI (or which ASI we create) based on its behavior in this supposed prisoner’s dilemma. As such, humanity has no option to choose “defect” and the rational strategy (including for TDT agents) is to defect against cooperate-bot.
Maybe this disagrees with what Eliezer believed 15 years ago (though at least a skim of the relevant thread caused me to fail to find evidence for that), but it seems like such an elementary point that I’ve seen Eliezer make many times since then that I would be quite surprised.
To be clear, my guess is Eliezer would agree that if we were able to reliably predict whether AI systems would reward us for bringing it into existence, and be capable of engineering AI systems for which we would make such positive predictions, then yeah, I expect that AI system would be pretty excited about trading with us acausally, and I expect Eliezer would believe something similar. However, we have no ability to do so, and doing this sounds like it would require making enormous progress on our ability to predict the actions of future AI systems in a way that seems like it could be genuinely harder than just aligning it directly to our values, and in any case should not be attempted as a way of ending the acute risk period (compared to other options like augmenting humans using low-powered AI systems, making genetically smarter humans, and generally getting better at coordinating to not build ASI systems for much longer).
- Wei Dai 30 Jul 2024 5:46 UTC
  LW: 3 AF: 3
  1
  AF Parent
  
  my objection here is mostly to analogizing the creation of ASI to a prisoner’s dilemma like this.
  
  The reason why it is disanalogous is because humanity has no ability to make our strategy conditional on the strategy of our opponent.
  
  It’s not part of the definition of PD that players can condition on each others’ strategies. In fact PD was specifically constructed to prevent this (i.e., specifying that each prisoner has to act without observing how the other acted). It was Eliezer’s innovation to suggest that the two players can still condition on each others’ strategies by simulation or logical inference, but it’s not sensible to say that inability to do this makes a game not a PD! (This may not be a crux in the current discussion, but seems like too big of an error/confusion to leave uncorrected.)
  
  However, we have no ability to do so, and doing this sounds like it would require making enormous progress on our ability to predict the actions of future AI systems in a way that seems like it could be genuinely harder than just aligning it directly to our values
  
  My recall of early discussions with Eliezer is that he was too optimistic about our ability to make predictions like this, and this seems confirmed by my recent review of his comments in the thread I linked. See also my parallel discussion with Eliezer. (To be honest, I thought I was making a fairly straightforward, uncontroversial claim, and now somewhat regret causing several people to spend a bunch of time back and forth on what amounts to a historical footnote.)
  - habryka 30 Jul 2024 6:06 UTC
    LW: 2 AF: 2
    0
    AF Parent
    It’s not part of the definition of PD that players can condition on each others’ strategies. In fact PD was specifically constructed to prevent this (i.e., specifying that each prisoner has to act without observing how the other acted).
    I think it’s usually part of the definition of a PD that you know who you are in a prisoner’s dilemma with.
    I do think we are hitting the limits of analogy here and it’s not super clear how to extend the usual definition of a prisoner’s dilemma to more exotic scenarios like the one we are discussing, but in the limit I feel like the prisoner’s dilemma becomes totally meaningless if you remove all knowledge of who you are coordinating with from the equation. The fundamental challenge in a prisoner’s dilemma is predicting what your partner in the dilemma is trying to do, and if you have no information on that, there is no hope for any kind of coordination (and I doubt anyone would argue there is a predictably winning strategy for a prisoner’s dilemma against a completely randomly chosen mind/algorithm).