Wei Dai comments on Decision theory does not imply that we get to have nice things

Wei Dai 30 Jul 2024 5:46 UTC
LW: 3 AF: 3
1
AF

my objection here is mostly to analogizing the creation of ASI to a prisoner’s dilemma like this.

The reason why it is disanalogous is because humanity has no ability to make our strategy conditional on the strategy of our opponent.

It’s not part of the definition of PD that players can condition on each others’ strategies. In fact PD was specifically constructed to prevent this (i.e., specifying that each prisoner has to act without observing how the other acted). It was Eliezer’s innovation to suggest that the two players can still condition on each others’ strategies by simulation or logical inference, but it’s not sensible to say that inability to do this makes a game not a PD! (This may not be a crux in the current discussion, but seems like too big of an error/confusion to leave uncorrected.)

However, we have no ability to do so, and doing this sounds like it would require making enormous progress on our ability to predict the actions of future AI systems in a way that seems like it could be genuinely harder than just aligning it directly to our values

My recall of early discussions with Eliezer is that he was too optimistic about our ability to make predictions like this, and this seems confirmed by my recent review of his comments in the thread I linked. See also my parallel discussion with Eliezer. (To be honest, I thought I was making a fairly straightforward, uncontroversial claim, and now somewhat regret causing several people to spend a bunch of time back and forth on what amounts to a historical footnote.)
- habryka 30 Jul 2024 6:06 UTC
  LW: 2 AF: 2
  0
  AF Parent
  It’s not part of the definition of PD that players can condition on each others’ strategies. In fact PD was specifically constructed to prevent this (i.e., specifying that each prisoner has to act without observing how the other acted).
  I think it’s usually part of the definition of a PD that you know who you are in a prisoner’s dilemma with.
  I do think we are hitting the limits of analogy here and it’s not super clear how to extend the usual definition of a prisoner’s dilemma to more exotic scenarios like the one we are discussing, but in the limit I feel like the prisoner’s dilemma becomes totally meaningless if you remove all knowledge of who you are coordinating with from the equation. The fundamental challenge in a prisoner’s dilemma is predicting what your partner in the dilemma is trying to do, and if you have no information on that, there is no hope for any kind of coordination (and I doubt anyone would argue there is a predictably winning strategy for a prisoner’s dilemma against a completely randomly chosen mind/algorithm).