So8res comments on You can, in fact, bamboozle an unaligned AI into sparing your life

So8res 30 Sep 2024 18:09 UTC
5 points
3

I largely agree with your comment, except that I think that us deciding to pay if we win is entangled with/evidence for a general willingness to pay among the gods, and in that sense it’s partially “our” decision doing the work of saving us.

Sure, like how when a child sees a fireman pull a woman out of a burning building and says “if I were that big and strong, I would also pull people out of burning buildings”, in a sense it’s partially the child’s decsiion that does the work of saving the woman. (There’s maybe a little overlap in how they run the same decision procedure that’s coming to the same conclusion in both cases, but vanishingly little of the credit goes to the child.)

in which case actually running the sims can be important

In the case where the AI is optimizing reality-and-instantiation-weighted experience, you’re giving it a threat, and your plan fails on the grounds that sane reasoners ignore that sort of threat.

in the case where your plan is “I am hoping that the AI will be insane in some other unspecified but precise way which will make it act as I wish”, I don’t see how it’s any more helpful than the plan “I am hoping the AI will be aligned”—it seems to me that we have just about as much ability to hit either target.
- Mitchell_Porter 1 Oct 2024 3:25 UTC
  2 points
  0
  Parent
  when a child sees a fireman pull a woman out of a burning building and says “if I were that big and strong, I would also pull people out of burning buildings”, in a sense it’s partially the child’s decision that does the work of saving the woman… but vanishingly little of the credit goes to the child
  The child is partly responsible—to a very small but nonzero degree—for the fireman’s actions, because the child’s personal decision procedure has some similarity to the fireman’s decision procedure?
  Is this a correct reading of what you said?
  - So8res 1 Oct 2024 3:34 UTC
    2 points
    0
    Parent
    I was responding to David saying
    
    Otherwise, I largely agree with your comment, except that I think that us deciding to pay if we win is entangled with/evidence for a general willingness to pay among the gods, and in that sense it’s partially “our” decision doing the work of saving us.
    
    and was insinuating that we deserve extremely little credit for such a choice, in the same way that a child deserves extremely little credit for a fireman saving someone that the child could not (even if it’s true that the child and the fireman share some aspects of a decision procedure). My claim was intended less like agreement with David’s claim and more like reductio ad absurdum, with the degree of absurdity left slightly ambiguous.
    
    (And on second thought, the analogy would perhaps have been tighter if the firefighter was saving the child.)
    - Mitchell_Porter 3 Oct 2024 1:19 UTC
      2 points
      0
      Parent
      I think the common sense view is that this similarity of decision procedures provides exactly zero reason to credit the child with the fireman’s decisions. Credit for a decision goes to the agent who makes it, or perhaps to the algorithm that the agent used, but not to other agents running the same or similar algorithms.