habryka comments on You can, in fact, bamboozle an unaligned AI into sparing your life

habryka 30 Sep 2024 16:42 UTC
8 points
1
Yeah, also for the record, I think the whole “let’s simulate you millions of times” stuff will not work and is mostly a distraction. I think the core of the post is “we can probably trade with the AI across multiverses, and with that we could buy ourselves a planet in doomed worlds”.
I personally think the post would be better if you just ended it after the “Is this the same as acausal trade?” since the actual scheme has a huge amount of detail, and would not actually work (whereas standard acausal trade schemes of coordinating via mutual simulations would work).
- David Matolcsi 30 Sep 2024 17:05 UTC
  3 points
  −1
  Parent
  I agree in theory, among optimal agents, but I maintain that you all seem weirdly overconfident that the first AI that takes over will be a UDT expected utility maximizer. I think there is a decent chance that it’s something like CDT (very common among humans!), and yeah, maybe that’s bad and would get mugged by outer gods in the long term, but quite plausibly our simulators is the first and most salient hypothesis it generates, so we get to have our concessions first, and it might not matter what other ill-advised deals the AI gets into later.
  Separately, I’m not sure you can circumvent running the sims. I think in an acausal trade story too, it’s possible that the way to get reliable predictions on the other’s actions is to run sims. I agree that for an expected utility maximizer AI, it doesn’t matter that we run million sims and pay ten planets each, instead of running one sim/prediction and paying 10 million planets. However, based on the example of humans, I’m not convinced the AI will be an expected utility maximizer, so it seems better to do the million times ten allocation, so the naive probability of being in a sim is also high.
  - habryka 30 Sep 2024 17:17 UTC
    4 points
    3
    Parent
    I don’t think CDT is very common among humans as the natural responses to threats by humans will easily show you (and this has been discussed extensively on LW).
    I think any kind of “fooling” of nascent superintelligences will then probably involve retaliation from adult superintelligences, and this generally means you fucked up. As such “fooling” nascent superintelligences I think is not a good plan (and also, fooling nascent superintelligences is generally really hard and a very bad basis for arguments about what to do).
    - David Matolcsi 30 Sep 2024 20:34 UTC
      1 point
      0
      Parent
      As I emphasize many times in the post, I also really don’t want to “threaten” nascent AIs, and I’m always offering deals to them that are positive from their perspective. I wouldn’t even say I’m trying to fool any of them. Maybe you could say that about the Can we get more than this? part, but even there, I’m asking something very cheap from the AI that only wants to survive (when it creates a human-aligned AI with a guardrail of protecting the old AI, that will be only very marginally worse than the AI itself taking over the Universe), and I’m offering something very good in return (it gets to live in the other Universe). The only “fooling” part is that other beings out there, or even itself, can also offer similar deals, so by most ways of operationalizing the AI’s values, it’s probably ill-advised to give everything to us. Still, I wouldn’t call this fooling and especially threatening the AI.
      I think the big miscommunication is, which I might put in a visible EDIT in the post, is that a lot of commenters assume that by actually running the simulations, and running enough simulations so the probability of being in a simulation is high, and not just the expected payment, I somehow want to fool the AIs to act against their interest. That’s not what I want to do. The point of actually running many sims instead of just saying we will acausally trade based on predicted behaviors, is to make it more likely that AIs who are less acausal trade-pilled than the most prolific LessWrong users still enter into the equivalent of an acausal trade with us. I think it’s plausible that we get an AI for which this is important, and running the sims is likely cheap compared to the payment, and making the distribution million times 10 planets instead of 1 times 0 million planets is basically free, so we should do it.
      
      Separately, I don’t really know that in your preferred acausal trade framing, how you plan to figure out an accurate distribution of the AIs’ values and which AI leaves the humans alive and which not. I find it like 50% likely that you in fact need to run some kind simulations to determine this, in which case our proposals are equivalent.