Wei Dai comments on Decision theory does not imply that we get to have nice things

Wei Dai 28 Jul 2024 4:16 UTC
LW: 3 AF: 3
0
AF
A common confusion I see in the tiny fragment of the world that knows about logical decision theory (FDT/UDT/etc.), is that people think LDT agents are genial and friendly for each other.
I’m reminded that @Eliezer Yudkowsky took a position like this in early decision theory discussions such as this one.
- Eliezer Yudkowsky 28 Jul 2024 21:30 UTC
  LW: 12 AF: 6
  8
  AF Parent
  I don’t always remember my previous positions all that well, but I doubt I would have said at any point that sufficiently advanced LDT agents are friendly to each other, rather than that they coordinate well with each other (and not so with us)?
  - Wei Dai 28 Jul 2024 22:44 UTC
    LW: 4 AF: 4
    0
    AF Parent
    I realized that my grandparent comment was stated badly, but didn’t get a chance to fix it before you replied. To clarify, the following comment of yours from the old thread seems to imply that we humans should be able to coordinate with a LDT agent in one shot PD (i.e., if we didn’t “mistakenly” believe that the LDT agent would defect). Translated into real life, this seems to imply that (if alignment is unsolvable) we should play “cooperate” by building unaligned ASI, and unaligned ASI should “cooperate” by treating us well once built.
    
    Smart players know that if they make the “smart” “thing to do on predictably non-public rounds” be to defect, then non-smart players will predict this even though they can’t predict which rounds are non-public; so instead they choose to make the “smart” thing (that is, the output of this “smart” decision computation) be to cooperate.
    
    The smart players can still lose out in a case where dumb players are also too dumb to simulate the smart players, have the mistaken belief that smart players will defect, and yet know infallibly who the smart players are; but this doesn’t seem quite so much the correctable fault of the smart players as before.
    
    But it’s only you who had in the first place the idea that smart players would defect on predictably private rounds, and you got that from a mistaken game theory in which agents only took into account the direct physical consequences of their actions, rather than the consequences of their decision computations having a particular Platonic output.
    - habryka 28 Jul 2024 23:57 UTC
      LW: 9 AF: 3
      6
      AF Parent
      Translated into real life, this seems to imply that (if alignment is unsolvable) we should play “cooperate” by building unaligned ASI, and unaligned ASI should “cooperate” by treating us well once built.
      This seems only implied if our choice to build the ASI was successfully conditional on the ASI cooperating with us as soon as its built. You don’t cooperate against cooperate-bot in the prisoner’s dilemma.
      If humanity’s choice to build ASI was independent of the cooperativeness of the ASI they built (which seems currently the default), I don’t see any reason for any ASI to be treating us well.
      - Wei Dai 29 Jul 2024 0:58 UTC
        LW: 5 AF: 5
        0
        AF Parent
        I think maybe I’m still failing to get my point across. I’m saying that Eliezer’s old position (which I argued against at the time, and which he perhaps no longer agrees with) implies that humans should be able to coordinate unaligned ASI in one-shot PD, and therefore he’s at least somewhat responsible for people thinking “decision theory implies that we get to have nice things”, i.e., the thing that the OP is arguing against.
        
        Or perhaps you did get my point, and you’re trying to push back by saying that in principle humans could coordinate with ASI, i.e., Eliezer’s old position was actually right, but in practice we’re not on track to doing that correctly?
        habryka 29 Jul 2024 1:46 UTC
        LW: 4 AF: 3
        2
        AF Parent
        In the link I didn’t see anything that suggests that Eliezer analogized creating ASI with a prisoner’s dilemma (though I might have missed it), so my objection here is mostly to analogizing the creation of ASI to a prisoner’s dilemma like this.
        The reason why it is disanalogous is because humanity has no ability to make our strategy conditional on the strategy of our opponent. The core reason why TDT/LDT agents would cooperate in a prisoner’s dilemma is because they can model their opponent and make their strategy conditional on their opponent’s strategy in a way that enables coordination. We currently seem to have no ability to choose whether we create ASI (or which ASI we create) based on its behavior in this supposed prisoner’s dilemma. As such, humanity has no option to choose “defect” and the rational strategy (including for TDT agents) is to defect against cooperate-bot.
        Maybe this disagrees with what Eliezer believed 15 years ago (though at least a skim of the relevant thread caused me to fail to find evidence for that), but it seems like such an elementary point that I’ve seen Eliezer make many times since then that I would be quite surprised.
        To be clear, my guess is Eliezer would agree that if we were able to reliably predict whether AI systems would reward us for bringing it into existence, and be capable of engineering AI systems for which we would make such positive predictions, then yeah, I expect that AI system would be pretty excited about trading with us acausally, and I expect Eliezer would believe something similar. However, we have no ability to do so, and doing this sounds like it would require making enormous progress on our ability to predict the actions of future AI systems in a way that seems like it could be genuinely harder than just aligning it directly to our values, and in any case should not be attempted as a way of ending the acute risk period (compared to other options like augmenting humans using low-powered AI systems, making genetically smarter humans, and generally getting better at coordinating to not build ASI systems for much longer).
        Wei Dai 30 Jul 2024 5:46 UTC
        LW: 3 AF: 3
        1
        AF Parent
        
        my objection here is mostly to analogizing the creation of ASI to a prisoner’s dilemma like this.
        
        The reason why it is disanalogous is because humanity has no ability to make our strategy conditional on the strategy of our opponent.
        
        It’s not part of the definition of PD that players can condition on each others’ strategies. In fact PD was specifically constructed to prevent this (i.e., specifying that each prisoner has to act without observing how the other acted). It was Eliezer’s innovation to suggest that the two players can still condition on each others’ strategies by simulation or logical inference, but it’s not sensible to say that inability to do this makes a game not a PD! (This may not be a crux in the current discussion, but seems like too big of an error/confusion to leave uncorrected.)
        
        However, we have no ability to do so, and doing this sounds like it would require making enormous progress on our ability to predict the actions of future AI systems in a way that seems like it could be genuinely harder than just aligning it directly to our values
        
        My recall of early discussions with Eliezer is that he was too optimistic about our ability to make predictions like this, and this seems confirmed by my recent review of his comments in the thread I linked. See also my parallel discussion with Eliezer. (To be honest, I thought I was making a fairly straightforward, uncontroversial claim, and now somewhat regret causing several people to spend a bunch of time back and forth on what amounts to a historical footnote.)
        habryka 30 Jul 2024 6:06 UTC
        LW: 2 AF: 2
        0
        AF Parent
        It’s not part of the definition of PD that players can condition on each others’ strategies. In fact PD was specifically constructed to prevent this (i.e., specifying that each prisoner has to act without observing how the other acted).
        I think it’s usually part of the definition of a PD that you know who you are in a prisoner’s dilemma with.
        I do think we are hitting the limits of analogy here and it’s not super clear how to extend the usual definition of a prisoner’s dilemma to more exotic scenarios like the one we are discussing, but in the limit I feel like the prisoner’s dilemma becomes totally meaningless if you remove all knowledge of who you are coordinating with from the equation. The fundamental challenge in a prisoner’s dilemma is predicting what your partner in the dilemma is trying to do, and if you have no information on that, there is no hope for any kind of coordination (and I doubt anyone would argue there is a predictably winning strategy for a prisoner’s dilemma against a completely randomly chosen mind/algorithm).
    - Eliezer Yudkowsky 29 Jul 2024 19:13 UTC
      LW: 7 AF: 5
      2
      AF Parent
      By “dumb player” I did not mean as dumb as a human player. I meant “too dumb to compute the pseudorandom numbers, but not too dumb to simulate other players faithfully apart from that”. I did not realize we were talking about humans at all. This jumps out more to me as a potential source of misunderstanding than it did 15 years ago, and for that I apologize.
      - Wei Dai 30 Jul 2024 5:03 UTC
        LW: 7 AF: 6
        3
        AF Parent
        
        I did not realize we were talking about humans at all.
        
        In this comment of yours later in that thread, it seems clear that you did have humans in mind and were talking specifically about a game between a human (namely me), and a “smart player”:
        
        You, however, are running a very small and simple computation in your own mind when you conclude “smart players should defect on non-public rounds”. But this is assuming the smart player is calculating in a way that doesn’t take into account your simple simulation of them, and your corresponding reaction. So you are not using TDT in your own head here, you are simulating a “smart” CDT decision agent—and CDT agents can indeed be harmed by increased knowledge or intelligence, like being told on which rounds an Omega is filling a Newcomb box “after” rather than “before” their decision. TDT agents, however, win—unless you have mistaken beliefs about them that don’t depend on their real actions, but that’s a genuine fault in you rather than anything dependent on the TDT decision process; and you’ll also suffer when the TDT agents calculate that you are not correctly computing what a TDT agent does, meaning your action is not in fact dependent on the output of their computation.
        
        Also that thread started with you saying “Don’t forget to retract: http://www.weidai.com/smart-losers.txt″ and that article mentioned humans in the first paragraph.