This is a great post on the topic which I pretty much entirely agree with. Thanks for writing this so I didn’t have to!
I think the argument presented in this post is a pretty strong case against “The AI will kill literally everyone with more than 80% probability”, so I wish people either stopped saying that, or explained why the argument presented here is wrong.
Agreed. I hope that the corresponding people are more careful in their statements going forward.
Here are some relatively minor notes:
If the AIs aren’t CDT agent, have a more sane decision theory (e.g. EDT/UDT), and have linear-ish returns to resources, then I think these sorts of arguments should all go through as long as you can sufficiently accurately predict the AI’s actions, regardless of whether it is simulated. Using prediction strategies which don’t depend on simulation could address the moral concerns you raise around sentient AIs. AIs with more sane decision theory and linear-ish returns also don’t care much about anthropic capture, so you should just need to predict them, anthropic capture isn’t required.
In the sane decision theory + linear-ish returns case, It should be possible to make something work out as long as the total measure-weighted multiverse resource allocation toward bailing out existing intelligent life is greater than the total measure-weighted resources required to avoid keep the original civilization going. This probably implies that it should be doable for pretty cheap and that the “too many simulators” objection doesn’t apply. When AIs don’t have sane decision theory or don’t have something well described as linear returns, it’s less clear. (Though I expect that AIs without sane decision theory but who do care about simulations/anthropic capture will probably get mugged for all of their stuff by systems we can bargain with.)
Making a commitment in advance to pay the AIs only seems important under pretty narrow assumptions. It does seem good to make it clear what your payment policy might be like so that AIs have an easier time with the prediction problem and know that it is probably a good to avoid killing people. (This is similar to what Buck said in another comment.)
I think you only benefit from such a commitment if both:
You don’t care about saving other universes (or care notably less than our current universe). If you cared as much about saving other universes, then you can just spend this later.
You’re EDT (CDT doesn’t care about influencing the behavior of aliens to also make commitments, UDT makes relevant commitments by default). (My guess is that you should just be UDT-ish and make a blanket commitment to “do the basically right version of decision theory/UDT whatever that might end up being”.
Probably the relevant scale of payment to avoid killing humans is more like 1/billion or 1⁄10 million rather than just a few planets. You note this, but mostly use the few planets while talking.
Another cost of delay is that AIs might terminally temporally discount. (It’s unclear how temporally discounting works when you consider simulations and the multiverse etc though).
I don’t think you should care basically at all about being in a simulation if you have a sane decision theory, have linear-ish returns to resources, and you were already working on longtermist stuff. I spent a while thinking about this some time ago. It already made sense to reduce x-risk and optimize for how much control your values/similar values end up having. If you’re CDT, then the sim argument should point toward being more UDT/EDT-ish in various ways though it might also cause you to take super wacky actions in the future at some point (e.g. getting anthropically mugged). If you aren’t working on longtermist stuff, then being in a sim should potentially alter your actions depending on your reasoning for being not doing longtermist stuff. (For instance, the animals probably aren’t sentient if we’re in a sim.)
You don’t really mention the argument that AIs might spare us due to being at least a bit kind. I think this is another reason to be skeptical about >80% on literally every human dies.
I partially agree, but I prefer not relying on the AI having “sane decision theories”. A young AI that just took over the world is not necessarily a perfectly rational agent, I want to present a case that appeals to as many potential AIs as possible, and I think doing the actual simulations helps a lot in that.
I agree that the AI having a small amount of natural kindness towards the humans is also a plausible way to survive, but I decided not to talk about that, as that is a separate line of argument from what I’m presenting, and Paul already argued for it in detail.
This is a great post on the topic which I pretty much entirely agree with. Thanks for writing this so I didn’t have to!
Agreed. I hope that the corresponding people are more careful in their statements going forward.
Here are some relatively minor notes:
If the AIs aren’t CDT agent, have a more sane decision theory (e.g. EDT/UDT), and have linear-ish returns to resources, then I think these sorts of arguments should all go through as long as you can sufficiently accurately predict the AI’s actions, regardless of whether it is simulated. Using prediction strategies which don’t depend on simulation could address the moral concerns you raise around sentient AIs. AIs with more sane decision theory and linear-ish returns also don’t care much about anthropic capture, so you should just need to predict them, anthropic capture isn’t required.
In the sane decision theory + linear-ish returns case, It should be possible to make something work out as long as the total measure-weighted multiverse resource allocation toward bailing out existing intelligent life is greater than the total measure-weighted resources required to avoid keep the original civilization going. This probably implies that it should be doable for pretty cheap and that the “too many simulators” objection doesn’t apply. When AIs don’t have sane decision theory or don’t have something well described as linear returns, it’s less clear. (Though I expect that AIs without sane decision theory but who do care about simulations/anthropic capture will probably get mugged for all of their stuff by systems we can bargain with.)
Making a commitment in advance to pay the AIs only seems important under pretty narrow assumptions. It does seem good to make it clear what your payment policy might be like so that AIs have an easier time with the prediction problem and know that it is probably a good to avoid killing people. (This is similar to what Buck said in another comment.)
I think you only benefit from such a commitment if both:
You don’t care about saving other universes (or care notably less than our current universe). If you cared as much about saving other universes, then you can just spend this later.
You’re EDT (CDT doesn’t care about influencing the behavior of aliens to also make commitments, UDT makes relevant commitments by default). (My guess is that you should just be UDT-ish and make a blanket commitment to “do the basically right version of decision theory/UDT whatever that might end up being”.
Probably the relevant scale of payment to avoid killing humans is more like 1/billion or 1⁄10 million rather than just a few planets. You note this, but mostly use the few planets while talking.
Another cost of delay is that AIs might terminally temporally discount. (It’s unclear how temporally discounting works when you consider simulations and the multiverse etc though).
I don’t think you should care basically at all about being in a simulation if you have a sane decision theory, have linear-ish returns to resources, and you were already working on longtermist stuff. I spent a while thinking about this some time ago. It already made sense to reduce x-risk and optimize for how much control your values/similar values end up having. If you’re CDT, then the sim argument should point toward being more UDT/EDT-ish in various ways though it might also cause you to take super wacky actions in the future at some point (e.g. getting anthropically mugged). If you aren’t working on longtermist stuff, then being in a sim should potentially alter your actions depending on your reasoning for being not doing longtermist stuff. (For instance, the animals probably aren’t sentient if we’re in a sim.)
You don’t really mention the argument that AIs might spare us due to being at least a bit kind. I think this is another reason to be skeptical about >80% on literally every human dies.
I partially agree, but I prefer not relying on the AI having “sane decision theories”. A young AI that just took over the world is not necessarily a perfectly rational agent, I want to present a case that appeals to as many potential AIs as possible, and I think doing the actual simulations helps a lot in that.
I agree that the AI having a small amount of natural kindness towards the humans is also a plausible way to survive, but I decided not to talk about that, as that is a separate line of argument from what I’m presenting, and Paul already argued for it in detail.