Wei Dai comments on Ukraine Post #2: Options

Wei Dai 13 Mar 2022 19:24 UTC
4 points
it looks like this is if anything a worse problem for existing systems used in practice by (e.g. Biden)

Why do you say this? I’m pretty worried about people adopting any kind of formal decision theory, and then making commitments earlier than they otherwise would, because that’s what the decision theory says is “rational”. If you have a good argument to the contrary, then I’d be less concerned about this.

It seems more like a “I notice all existing options have this issue” problem than anything else, and like it’s pointing to a flaw in consequentialism more broadly?

The addition issue with UDT/FDT is that they extend the Commitment Races Problem into logical time instead of just physical time:
- physical time: physically throwing away the wheel in a game of chicken before the other player does
- logical time: think as little as possible before making a commitment in your mind, because if you think more, you might conclude (via simulation or abstract reasoning) that the other player already made their commitment so now your own decision has to condition on that commitment (i.e., take it as a given), and by thinking more you also make it harder for the other player to conclude this about yourself
BTW you didn’t answer my request of examples of “humans are current doing is some mix of absurdly stupid things”. I’m still curious about that.
- Zvi 14 Mar 2022 14:39 UTC
  5 points
  Parent
  I think you’re taking the formal adoption of FDTs too literally here, or treating it as if it were the AGI case, as if humans were able to self-modify into machines fully capable of honoring commitments and then making arbitrary ones, or something? Whereas actual implementations here are pretty messy, and also they’re inscribed in the larger context of the social world.
  I also don’t understand the logical time argument here as it applies to humans?
  I can see in a situation where you’re starting out in fully symmetrical conditions with known source codes, or something, why you’d need to think super quick and make faster commitments. But I’m confused why that would apply to ordinary humans in ordinary spots?
  Or to bring it back to the thing I actually said in more detail, Biden seems like he’s using something close to pure CDT. So someone using commitments can get Biden to do quite a lot, and thus they make lots of crazy commitments.
  Whereas in a socially complex multi-polar situation, someone who was visibly making lots of crazy strong commitments super fast or something would some combination of (1) run into previous commitments made by others to treat such people poorly (2) be seen as a loose cannon and crazy actor to be put down (3) not be seen as credible because they’re still a human, sufficiently strong/fast/stupid commitments don’t work, etc.
  I think the core is—you are worried about people ‘formally adopting a decision theory’ and I think that’s not what actual people ever actually do. As in, you and I both have perhaps informally adapted such policies, but that’s importantly different and does not lead to these problems in these ways. On the margin such movements are simply helpful.
  (On your BTW, I literally meant that to refer to the central case of ‘what people do in general when they have non-trivial decisions, in general’ - that those without a formal policy don’t do anything coherent, and often change their answers dramatically based on social context or to avoid mild awkwardness, and so on, if I have time I’ll think about what the best examples of this would be but e.g. I’ve been writing about crazy decisions surrounding Covid for 2+ years now.)
  - Wei Dai 15 Mar 2022 3:08 UTC
    3 points
    Parent
    
    I think you’re taking the formal adoption of FDTs too literally here, or treating it as if it were the AGI case, as if humans were able to self-modify into machines fully capable of honoring commitments and then making arbitrary ones, or something?
    
    Actually, my worry is kind of in the opposite direction, namely that we don’t really know how FDT can or should be applied in humans, but someone with a vague understanding of FDT might “adopt FDT” and then use it to handwavingly justify some behavior or policy. For example someone might think, “FDT says that we should think as little as possible before mentally making commitments, so that’s what I’ll do.”
    
    Or take the example of your OP, in which you invoke FDT, but don’t explain in any mathematical detail how FDT implies the conclusions you’re seemingly drawing from it.
    
    Or to bring it back to the thing I actually said in more detail, Biden seems like he’s using something close to pure CDT. So someone using commitments can get Biden to do quite a lot, and thus they make lots of crazy commitments.
    
    Here too, I suspect you may have only a vague understanding of the difference between CDT and FDT. Resisting threats (“crazy commitments”) is often rational even under CDT, if you’re in a repeated game (i.e., being observed by players you may face in the future). I would guess your disagreement with Biden is probably better explained by something else besides FDT vs CDT.
    
    ETA: I also get a feeling that you have a biased perspective on the object level. If “someone using commitments can get Biden to do quite a lot”, why couldn’t Putin get Biden to promise not to admit Ukraine into NATO?
- dxu 13 Mar 2022 19:50 UTC
  3 points
  Parent
  I admit to not being super interested in the larger geopolitical context in which this discussion is embedded… but I do want to get into this bit a little more:
  think as little as possible before making a commitment in your mind, because if you think more, you might conclude (via simulation or abstract reasoning) that the other player already made their commitment so now your own decision has to condition on that commitment
  It’s not obvious to me why the bolded assertion follows; isn’t the point of “updatelessness” precisely that you ignore / refrain from conditioning your decision on (negative-sum) actions taken by your opponent in a way that would, if your conditioning on those actions was known in advance, predictably incentivize your opponent to take those actions? Isn’t that the whole point of having a decision theory that doesn’t give in to blackmail?
  Like, yes, one way to refuse to condition on that kind of thing is to refuse to even compute it, but it seems very odd to me to assert that this is the best way to do things. At the very least, you can compute everything first, and then decide to retroactively ignore all the stuff you “shouldn’t have” computed, right? In terms of behavior this ought not provide any additional incentives to your opponent to take stupid (read: negative-sum) actions, while still providing the rest of the advantages that come with “thinking things through”… right?
  and by thinking more you also make it harder for the player to conclude this about yourself
  This part is more compelling in my view, but also it kind of seems… outside of decision theory’s wheelhouse? Like, yes, once you start introducing computational constraints and other real-world weirdness, things can and do start getting messy… but also, the messiness that results isn’t a reason to abandon the underlying decision theory?
  For example, I could say “Imagine a crazy person really, really wants to kill you, and the reason they want to do this is that their brain is in some sense bugged; what does your decision theory say you should do in this situation?” And the answer is that your decision theory doesn’t say anything (well, anything except “this opponent is behaviorally identical to a DefectBot, so defect against them with all you have”), but that isn’t the decision theory’s fault, it’s just that you gave it an unfair scenario to start with.
  What, if anything, am I missing here?
  - Wei Dai 13 Mar 2022 22:43 UTC
    5 points
    Parent
    
    It’s not obvious to me why the bolded assertion follows; isn’t the point of “updatelessness” precisely that you ignore / refrain from conditioning your decision on (negative-sum) actions taken by your opponent in a way that would, if your conditioning on those actions was known in advance, predictably incentivize your opponent to take those actions? Isn’t that the whole point of having a decision theory that doesn’t give in to blackmail?
    
    By “has to” I didn’t mean that’s normatively the right thing to do, but rather that’s what UDT (as currently formulated) says to do. UDT is (currently) updateless with regard to physical observations (inputs from your sensors) but not logical observations (things that you compute in your mind), and nobody seems to know how to formulate a decision theory that is logically updateless (and not broken in other ways). It seems to be a hard problem as progress has been bogged down for more than 10 years.
    
    Conceptual Problems with UDT and Policy Selection is probably the best article to read to get up to date on this issue, if you want a longer answer.