ryan_greenblatt comments on You can, in fact, bamboozle an unaligned AI into sparing your life

ryan_greenblatt 29 Sep 2024 20:25 UTC
16 points
1
This is a great post on the topic which I ~~pretty much entirely~~ mostly agree with. Thanks for writing this so I didn’t have to!

I think the argument presented in this post is a pretty strong case against “The AI will kill literally everyone with more than 80% probability”, so I wish people either stopped saying that, or explained why the argument presented here is wrong.

Agreed. I hope that the corresponding people are more careful in their statements going forward.

Here are some relatively minor notes:
- If the AIs aren’t CDT agent, have a more sane decision theory (e.g. EDT/UDT), and have linear-ish returns to resources, then I think these sorts of arguments should all go through as long as you can sufficiently accurately predict the AI’s actions, regardless of whether it is simulated. Using prediction strategies which don’t depend on simulation could address the moral concerns you raise around sentient AIs. AIs with more sane decision theory and linear-ish returns also don’t care much about anthropic capture, so you should just need to predict them, anthropic capture isn’t required.
- In the sane decision theory + linear-ish returns case, It should be possible to make something work out as long as the total measure-weighted multiverse resource allocation toward bailing out existing intelligent life is greater than the total measure-weighted resources required to avoid keep the original civilization going. This probably implies that it should be doable for pretty cheap and that the “too many simulators” objection doesn’t apply. When AIs don’t have sane decision theory or don’t have something well described as linear returns, it’s less clear. (Though I expect that AIs without sane decision theory but who do care about simulations/anthropic capture will probably get mugged for all of their stuff by systems we can bargain with.)
- Making a commitment in advance to pay the AIs only seems important under pretty narrow assumptions. It does seem good to make it clear what your payment policy might be like so that AIs have an easier time with the prediction problem and know that it is probably a good to avoid killing people. (This is similar to what Buck said in another comment.)
  - I think you only benefit from such a commitment if both:
    You don’t care about saving other universes (or care notably less than our current universe). If you cared as much about saving other universes, then you can just spend this later.
    You’re EDT (CDT doesn’t care about influencing the behavior of aliens to also make commitments, UDT makes relevant commitments by default). (My guess is that you should just be UDT-ish and make a blanket commitment to “do the basically right version of decision theory/UDT whatever that might end up being”.
- Probably the relevant scale of payment to avoid killing humans is more like 1/billion or ¹⁄₁₀ million rather than just a few planets. You note this, but mostly use the few planets while talking. Edit: I think the cost will be much more than ¹⁄₁₀ million if you want to alter the AIs takeover plans not just what it does after having nearly complete control. The cost will also be much higher if there is strong competition between AIs such that delay is extremely costly.
- Another cost of delay is that AIs might terminally temporally discount. (It’s unclear how temporally discounting works when you consider simulations and the multiverse etc though).
- On “Are we in a simulation? What should we do?”, I don’t think you should care basically at all about being in a simulation if you have a sane decision theory, have linear-ish returns to resources, and you were already working on longtermist stuff. I spent a while thinking about this some time ago. It already made sense to reduce x-risk and optimize for how much control your values/similar values end up having. If you’re CDT, then the sim argument should point toward being more UDT/EDT-ish in various ways though it might also cause you to take super wacky actions in the future at some point (e.g. getting anthropically mugged). If you aren’t working on longtermist stuff, then being in a sim should potentially alter your actions depending on your reasoning for being not doing longtermist stuff. (For instance, the animals probably aren’t sentient if we’re in a sim.)
- You don’t really mention the argument that AIs might spare us due to being at least a bit kind. I think this is another reason to be skeptical about >80% on literally every human dies.
- Edit: I think this post often acts as though AIs are CDT agents and otherwise have relatively dumb decision theories. (Non-CDT agents don’t care about what sims are run as long as the relevant trading partners are making accurate enough predictions.) I think if AIs are responsive to simulation arguments, they won’t be CDT. Further, CDT AIs which are responsive to simulation arguments plausibly get mugged for all of their stuff^[1], so you mostly care about trading with the AIs that mug them as they have no influence.
- Edit: I think this post is probably confused about acausal trade in at least 1 place.
1. ↩︎
  I’m not going to justify this here.
- ryan_greenblatt 30 Sep 2024 16:17 UTC
  4 points
  2
  Parent
  Some more notes:
  - We shouldn’t expect that we get a huge win from AIs which are anthropically muggable, as discussed in Can we get more than this?, because other people will also be mugging these AIs and thus the price of marginal mugged resources will increase until it reaches marginal cost. Such AIs (which clearly have a crazy decision theory) will get their resources distributed out, but we can still trade with the other civilizations that get their resources etc. Overall, we should just focus on which positive sum trades are possible and the anthropic mugging stuff is a distraction due to competition. (Thanks to various commenters for making this more clear to me.)
  - One issue with this scheme is that at the point where the AIs need to take the biggest costs to spare humans (during takeover and immediately after), they will not necessarily have super strong predictive abilities. Thus, it is unclear that a normal acausal trade setup with good prediction will work. As in, future humans/aliens might know that the AI’s early actions aren’t sensitive to their actions and the AI will also know this and thus a trade doesn’t happen. I think minimally a binding commitment from humanity could work (if well specified), though to actually avoid dying we also need aliens/other branches to make similar commitments.
  - David Matolcsi 30 Sep 2024 20:14 UTC
    1 point
    0
    Parent
    I agree that the Can we get more than this? part is likely not to work because others are running simulations too about AIs who have these kind of decision theory. (In the post, I allude to the AI itself running ancestor sims of itself, but I agree others could run it too.) Still, I give a negligible chance that we are the most salient simulators for this AI, so we are the first one it enters into a trade with. But I agree that on a second thought, this seems less likely.
- ryan_greenblatt 30 Sep 2024 1:01 UTC
  4 points
  0
  Parent
  To be clear, I think the exact scheme in A proposal for humanity in the future probably doesn’t work as described because the exact level of payment is wrong and more minimally we’ll probably be able to make a much better approach in the future.
  
  This seemed important to explicitly call out (and it wasn’t called out explicitly in the post), though I do think it is reasonable to outline a concrete baseline proposal for how this can work.
  
  In particular, the proposal randomly picks 10 planets per simulation. I think the exact right amount of payment will depend on how many sims/predictions you run and will heavily depend on some of the caveats under Ways this hope could fail. I think you probably get decent results if the total level of payment is around ¹⁄₁₀ million, with returns to higher aggregate payment etc.
  
  As far as better approaches, I expect that you’ll be doing a bunch of stuff more efficient than sims and this will be part of a more general acausal trade operation among other changes.
- David Matolcsi 29 Sep 2024 22:31 UTC
  1 point
  0
  Parent
  I partially agree, but I prefer not relying on the AI having “sane decision theories”. A young AI that just took over the world is not necessarily a perfectly rational agent, I want to present a case that appeals to as many potential AIs as possible, and I think doing the actual simulations helps a lot in that.
  
  I agree that the AI having a small amount of natural kindness towards the humans is also a plausible way to survive, but I decided not to talk about that, as that is a separate line of argument from what I’m presenting, and Paul already argued for it in detail.