ryan_greenblatt comments on You can, in fact, bamboozle an unaligned AI into sparing your life

ryan_greenblatt 30 Sep 2024 16:17 UTC
4 points
2
Some more notes:
- We shouldn’t expect that we get a huge win from AIs which are anthropically muggable, as discussed in Can we get more than this?, because other people will also be mugging these AIs and thus the price of marginal mugged resources will increase until it reaches marginal cost. Such AIs (which clearly have a crazy decision theory) will get their resources distributed out, but we can still trade with the other civilizations that get their resources etc. Overall, we should just focus on which positive sum trades are possible and the anthropic mugging stuff is a distraction due to competition. (Thanks to various commenters for making this more clear to me.)
- One issue with this scheme is that at the point where the AIs need to take the biggest costs to spare humans (during takeover and immediately after), they will not necessarily have super strong predictive abilities. Thus, it is unclear that a normal acausal trade setup with good prediction will work. As in, future humans/aliens might know that the AI’s early actions aren’t sensitive to their actions and the AI will also know this and thus a trade doesn’t happen. I think minimally a binding commitment from humanity could work (if well specified), though to actually avoid dying we also need aliens/other branches to make similar commitments.
- David Matolcsi 30 Sep 2024 20:14 UTC
  1 point
  0
  Parent
  I agree that the Can we get more than this? part is likely not to work because others are running simulations too about AIs who have these kind of decision theory. (In the post, I allude to the AI itself running ancestor sims of itself, but I agree others could run it too.) Still, I give a negligible chance that we are the most salient simulators for this AI, so we are the first one it enters into a trade with. But I agree that on a second thought, this seems less likely.