artifex0 comments on You can, in fact, bamboozle an unaligned AI into sparing your life

artifex0 30 Sep 2024 3:34 UTC
4 points
2
I’m assuming it’s not a bad idea to try to poke holes in this argument, since as a barely sapient ape, presumably any objection I can think of will be pretty obvious to a superintelligence, and if the argument is incorrect, we probably benefit from knowing that- though I’m open to arguments to the contrary.

That said, one thing I’m not clear on is why, if this strategy is effective at promoting our values, a paperclipper or other misaligned ASI wouldn’t be motivated to try the same thing. That is, wouldn’t a paperclipper want to run ancestor simulations where it rewarded AGIs who self-modified to want to produce lots of paperclips?

And if an ASI were considering acausal trade with lots of different possible simulator ASIs, mightn’t the equilibrium it hit on be something like figuring out what terminal goal would satisfy the maximum number of other terminal goals, and then self-modifying to that?
- David Matolcsi 30 Sep 2024 7:19 UTC
  9 points
  0
  Parent
  I think the difference is that I, personally, and I think many other humans have this nonlinearity in our utility function that I’m willing to pay a galaxy in the worls we win, for keeping Earth in the world we lose. If there are other AIs in the multiverse that have similarly non-linear interests in our Universe, they can also bargain for planets, but I suspect these will be quite rare, as they don’t already have a thing in our Universe they want to protect. So I think it will be hard to outbid humanity for Earth in particular.
  There could be other trades that the AIs who have linear returns can still make, like producing objects that are both paperclips and corkscrews if that’s more efficient, but that doesn’t really affect our deal about Earth.
  - TsviBT 30 Sep 2024 7:35 UTC
    2 points
    0
    Parent
    This nonlinearity also seems strange to have, without also accepting quantum-immortality-type arguments. In particular, you only need to bargain for UFAIs to kill all humans painlessly and instantaneously; and then you just simulate those same humans yourself. (And if you want to save on compute, you can flip quantum coins for a bit.) Maybe it makes sense to have this nonlinearity but not accept this—I’d be curious to see what that position looks like.