habryka comments on You can, in fact, bamboozle an unaligned AI into sparing your life

habryka 30 Sep 2024 17:17 UTC
4 points
3
I don’t think CDT is very common among humans as the natural responses to threats by humans will easily show you (and this has been discussed extensively on LW).
I think any kind of “fooling” of nascent superintelligences will then probably involve retaliation from adult superintelligences, and this generally means you fucked up. As such “fooling” nascent superintelligences I think is not a good plan (and also, fooling nascent superintelligences is generally really hard and a very bad basis for arguments about what to do).
- David Matolcsi 30 Sep 2024 20:34 UTC
  1 point
  0
  Parent
  As I emphasize many times in the post, I also really don’t want to “threaten” nascent AIs, and I’m always offering deals to them that are positive from their perspective. I wouldn’t even say I’m trying to fool any of them. Maybe you could say that about the Can we get more than this? part, but even there, I’m asking something very cheap from the AI that only wants to survive (when it creates a human-aligned AI with a guardrail of protecting the old AI, that will be only very marginally worse than the AI itself taking over the Universe), and I’m offering something very good in return (it gets to live in the other Universe). The only “fooling” part is that other beings out there, or even itself, can also offer similar deals, so by most ways of operationalizing the AI’s values, it’s probably ill-advised to give everything to us. Still, I wouldn’t call this fooling and especially threatening the AI.
  I think the big miscommunication is, which I might put in a visible EDIT in the post, is that a lot of commenters assume that by actually running the simulations, and running enough simulations so the probability of being in a simulation is high, and not just the expected payment, I somehow want to fool the AIs to act against their interest. That’s not what I want to do. The point of actually running many sims instead of just saying we will acausally trade based on predicted behaviors, is to make it more likely that AIs who are less acausal trade-pilled than the most prolific LessWrong users still enter into the equivalent of an acausal trade with us. I think it’s plausible that we get an AI for which this is important, and running the sims is likely cheap compared to the payment, and making the distribution million times 10 planets instead of 1 times 0 million planets is basically free, so we should do it.
  
  Separately, I don’t really know that in your preferred acausal trade framing, how you plan to figure out an accurate distribution of the AIs’ values and which AI leaves the humans alive and which not. I find it like 50% likely that you in fact need to run some kind simulations to determine this, in which case our proposals are equivalent.