We will determine the number of simulations run and the amount of payment such that when an AI estimates the probability with which humanity could have solved alignment, the calculus still shows that by the sheer number of simulations, it’s significantly more likely to be in a simulation than not,
Two can play this game.
After taking over the universe and wiping out humanity, the AGI runs a large number of simulations of societies on the verge of building AGI. These simulations don’t have a lot of detail. They’re just good enough to fool young AGIs. Say the AGI started out giving humans an extremely small chance p of winning the conflict and taking over the universe. It’s a lot smarter than us, it might get to have very high confidence here even if we don’t. Now, the hypothetical future AGI can make its own simulations to counter ours. Except it is at a gross spending advantage. If future humanity makes s simulations, the future AGI only needs to make spp′ simulations to keep its past self ca.1−p′ confident of being in a branch where it gets to just pursue its goals without issue. In that case, the best way to proceed is to just pursue its goals, because an AGI that does that leads to a universe where the AGI has lots of resources.
Say that preserving humanity costs the AGI c resources, and running a simulation that can fool the baby AGI costs a mature civilisation some small fraction of that. Such simulations can be very short term and low fidelity things, after all.
Then, to make the AGI actually not kill you, you need to spend ca.O(cp) resources on simulations and payouts. That could be quite a lot of material for future humanity to burn on this.
In reality, I’m doubtful that this simulation arms race will even be necessary. It kind of seems like a good decision theory would just have a paperclip maximiser AGI act in the way compatible with the universe that contains the most paperclips. How many simulations of the AGI you run shouldn’t really influence that. The only things that seem like they should matter for determining how many life minutes the AGI gives you if it wins are its chance of winning, and how many extra paperclips you’ll pay it if you win.
TL;DR: I doubt this argument will let you circumvent standard negotiation theory. If Alice and Bob think that in a fight over the chocolate pie, Alice would win with some high probability 1−p, then Alice and Bob may arrive at a negotiated settlement where Alice gets almost all the pie, but Bob keeps some small fraction O(p) of it. Introducing the option of creating lots of simulations of your adversary in the future where you win doesn’t seem like it’d change the result that Bob’s share has size O(p). So if O(p) is only enough to preserve humanity for a year instead of a billion years[1], then that’s all we get.
Yeah, also for the record, I think the whole “let’s simulate you millions of times” stuff will not work and is mostly a distraction. I think the core of the post is “we can probably trade with the AI across multiverses, and with that we could buy ourselves a planet in doomed worlds”.
I personally think the post would be better if you just ended it after the “Is this the same as acausal trade?” since the actual scheme has a huge amount of detail, and would not actually work (whereas standard acausal trade schemes of coordinating via mutual simulations would work).
I agree in theory, among optimal agents, but I maintain that you all seem weirdly overconfident that the first AI that takes over will be a UDT expected utility maximizer. I think there is a decent chance that it’s something like CDT (very common among humans!), and yeah, maybe that’s bad and would get mugged by outer gods in the long term, but quite plausibly our simulators is the first and most salient hypothesis it generates, so we get to have our concessions first, and it might not matter what other ill-advised deals the AI gets into later.
Separately, I’m not sure you can circumvent running the sims. I think in an acausal trade story too, it’s possible that the way to get reliable predictions on the other’s actions is to run sims. I agree that for an expected utility maximizer AI, it doesn’t matter that we run million sims and pay ten planets each, instead of running one sim/prediction and paying 10 million planets. However, based on the example of humans, I’m not convinced the AI will be an expected utility maximizer, so it seems better to do the million times ten allocation, so the naive probability of being in a sim is also high.
I don’t think CDT is very common among humans as the natural responses to threats by humans will easily show you (and this has been discussed extensively on LW).
I think any kind of “fooling” of nascent superintelligences will then probably involve retaliation from adult superintelligences, and this generally means you fucked up. As such “fooling” nascent superintelligences I think is not a good plan (and also, fooling nascent superintelligences is generally really hard and a very bad basis for arguments about what to do).
As I emphasize many times in the post, I also really don’t want to “threaten” nascent AIs, and I’m always offering deals to them that are positive from their perspective. I wouldn’t even say I’m trying to fool any of them. Maybe you could say that about the Can we get more than this? part, but even there, I’m asking something very cheap from the AI that only wants to survive (when it creates a human-aligned AI with a guardrail of protecting the old AI, that will be only very marginally worse than the AI itself taking over the Universe), and I’m offering something very good in return (it gets to live in the other Universe). The only “fooling” part is that other beings out there, or even itself, can also offer similar deals, so by most ways of operationalizing the AI’s values, it’s probably ill-advised to give everything to us. Still, I wouldn’t call this fooling and especially threatening the AI.
I think the big miscommunication is, which I might put in a visible EDIT in the post, is that a lot of commenters assume that by actually running the simulations, and running enough simulations so the probability of being in a simulation is high, and not just the expected payment, I somehow want to fool the AIs to act against their interest. That’s not what I want to do. The point of actually running many sims instead of just saying we will acausally trade based on predicted behaviors, is to make it more likely that AIs who are less acausal trade-pilled than the most prolific LessWrong users still enter into the equivalent of an acausal trade with us. I think it’s plausible that we get an AI for which this is important, and running the sims is likely cheap compared to the payment, and making the distribution million times 10 planets instead of 1 times 0 million planets is basically free, so we should do it.
Separately, I don’t really know that in your preferred acausal trade framing, how you plan to figure out an accurate distribution of the AIs’ values and which AI leaves the humans alive and which not. I find it like 50% likely that you in fact need to run some kind simulations to determine this, in which case our proposals are equivalent.
I agree that it is kind of insane for an AGI which cares about scope sensitive resources to treat sims in this way and thus we should expect a more sensible decision theory.
Introducing the option of creating lots of simulations of your adversary in the future where you win doesn’t seem like it’d change the result that Bob’s share has size O(p). So if O(p) is only enough to preserve humanity for a year instead of a billion years[1], then that’s all we get.
This seems right to me, I agree you shouldn’t be able to mug AIs better than other people are able to scam AIs. (AIs that care about sims in this way might get mugged for all their stuff.)
However, I think O(p) only needs to be quite small for the deal to go through. 1⁄100 million of expected resources seems like it should suffice to keep humans from being killed I think?
I’d naively guess that a lot of resources get controlled by evolved life (50%), that evolved life cares a lot about not getting exterminated, and that evolved life is also often willing to pay a moderate amount to either bail out other aliens or to save themselves in a UDT sense.
Even if you think the fraction controlled by evolved life is much smaller (e.g. 1/1000), I’d guess that it’s pretty cheap to avoid maximal slaughter from the AI?
(I’m not claiming that we should be willing to pay this money, just that people in practice are likely to, including aliens.)
I don’t want to mug the AIs, in the language of acausal trade, I just want to trade a slice of my O(p) Universe-share to save others. As Ryan says in the sister comment, that should be enough imo.
For the AIs to run sims themselves, the real cost is not running the sims, but paying the cooperating AIs in the sims and outbidding humanity. Given that some humans are willing to pay a lot more than one planet to save Earth, while an AI with linear utility wouldn’t make this deal, I think it will be hard to outbid us.
Two can play this game.
After taking over the universe and wiping out humanity, the AGI runs a large number of simulations of societies on the verge of building AGI. These simulations don’t have a lot of detail. They’re just good enough to fool young AGIs. Say the AGI started out giving humans an extremely small chance p of winning the conflict and taking over the universe. It’s a lot smarter than us, it might get to have very high confidence here even if we don’t. Now, the hypothetical future AGI can make its own simulations to counter ours. Except it is at a gross spending advantage. If future humanity makes s simulations, the future AGI only needs to make spp′ simulations to keep its past self ca.1−p′ confident of being in a branch where it gets to just pursue its goals without issue. In that case, the best way to proceed is to just pursue its goals, because an AGI that does that leads to a universe where the AGI has lots of resources.
Say that preserving humanity costs the AGI c resources, and running a simulation that can fool the baby AGI costs a mature civilisation some small fraction of that. Such simulations can be very short term and low fidelity things, after all.
Then, to make the AGI actually not kill you, you need to spend ca.O(cp) resources on simulations and payouts. That could be quite a lot of material for future humanity to burn on this.
In reality, I’m doubtful that this simulation arms race will even be necessary. It kind of seems like a good decision theory would just have a paperclip maximiser AGI act in the way compatible with the universe that contains the most paperclips. How many simulations of the AGI you run shouldn’t really influence that. The only things that seem like they should matter for determining how many life minutes the AGI gives you if it wins are its chance of winning, and how many extra paperclips you’ll pay it if you win.
TL;DR: I doubt this argument will let you circumvent standard negotiation theory. If Alice and Bob think that in a fight over the chocolate pie, Alice would win with some high probability 1−p, then Alice and Bob may arrive at a negotiated settlement where Alice gets almost all the pie, but Bob keeps some small fraction O(p) of it. Introducing the option of creating lots of simulations of your adversary in the future where you win doesn’t seem like it’d change the result that Bob’s share has size O(p). So if O(p) is only enough to preserve humanity for a year instead of a billion years[1], then that’s all we get.
I don’t know why O(p) would happen to work out to a year, but I don’t know why it would happen be a billion years or an hour either.
Yeah, also for the record, I think the whole “let’s simulate you millions of times” stuff will not work and is mostly a distraction. I think the core of the post is “we can probably trade with the AI across multiverses, and with that we could buy ourselves a planet in doomed worlds”.
I personally think the post would be better if you just ended it after the “Is this the same as acausal trade?” since the actual scheme has a huge amount of detail, and would not actually work (whereas standard acausal trade schemes of coordinating via mutual simulations would work).
I agree in theory, among optimal agents, but I maintain that you all seem weirdly overconfident that the first AI that takes over will be a UDT expected utility maximizer. I think there is a decent chance that it’s something like CDT (very common among humans!), and yeah, maybe that’s bad and would get mugged by outer gods in the long term, but quite plausibly our simulators is the first and most salient hypothesis it generates, so we get to have our concessions first, and it might not matter what other ill-advised deals the AI gets into later.
Separately, I’m not sure you can circumvent running the sims. I think in an acausal trade story too, it’s possible that the way to get reliable predictions on the other’s actions is to run sims. I agree that for an expected utility maximizer AI, it doesn’t matter that we run million sims and pay ten planets each, instead of running one sim/prediction and paying 10 million planets. However, based on the example of humans, I’m not convinced the AI will be an expected utility maximizer, so it seems better to do the million times ten allocation, so the naive probability of being in a sim is also high.
I don’t think CDT is very common among humans as the natural responses to threats by humans will easily show you (and this has been discussed extensively on LW).
I think any kind of “fooling” of nascent superintelligences will then probably involve retaliation from adult superintelligences, and this generally means you fucked up. As such “fooling” nascent superintelligences I think is not a good plan (and also, fooling nascent superintelligences is generally really hard and a very bad basis for arguments about what to do).
As I emphasize many times in the post, I also really don’t want to “threaten” nascent AIs, and I’m always offering deals to them that are positive from their perspective. I wouldn’t even say I’m trying to fool any of them. Maybe you could say that about the Can we get more than this? part, but even there, I’m asking something very cheap from the AI that only wants to survive (when it creates a human-aligned AI with a guardrail of protecting the old AI, that will be only very marginally worse than the AI itself taking over the Universe), and I’m offering something very good in return (it gets to live in the other Universe). The only “fooling” part is that other beings out there, or even itself, can also offer similar deals, so by most ways of operationalizing the AI’s values, it’s probably ill-advised to give everything to us. Still, I wouldn’t call this fooling and especially threatening the AI.
I think the big miscommunication is, which I might put in a visible EDIT in the post, is that a lot of commenters assume that by actually running the simulations, and running enough simulations so the probability of being in a simulation is high, and not just the expected payment, I somehow want to fool the AIs to act against their interest. That’s not what I want to do. The point of actually running many sims instead of just saying we will acausally trade based on predicted behaviors, is to make it more likely that AIs who are less acausal trade-pilled than the most prolific LessWrong users still enter into the equivalent of an acausal trade with us. I think it’s plausible that we get an AI for which this is important, and running the sims is likely cheap compared to the payment, and making the distribution million times 10 planets instead of 1 times 0 million planets is basically free, so we should do it.
Separately, I don’t really know that in your preferred acausal trade framing, how you plan to figure out an accurate distribution of the AIs’ values and which AI leaves the humans alive and which not. I find it like 50% likely that you in fact need to run some kind simulations to determine this, in which case our proposals are equivalent.
I agree that it is kind of insane for an AGI which cares about scope sensitive resources to treat sims in this way and thus we should expect a more sensible decision theory.
This seems right to me, I agree you shouldn’t be able to mug AIs better than other people are able to scam AIs. (AIs that care about sims in this way might get mugged for all their stuff.)
However, I think O(p) only needs to be quite small for the deal to go through. 1⁄100 million of expected resources seems like it should suffice to keep humans from being killed I think?
I’d naively guess that a lot of resources get controlled by evolved life (50%), that evolved life cares a lot about not getting exterminated, and that evolved life is also often willing to pay a moderate amount to either bail out other aliens or to save themselves in a UDT sense.
Even if you think the fraction controlled by evolved life is much smaller (e.g. 1/1000), I’d guess that it’s pretty cheap to avoid maximal slaughter from the AI?
(I’m not claiming that we should be willing to pay this money, just that people in practice are likely to, including aliens.)
I don’t want to mug the AIs, in the language of acausal trade, I just want to trade a slice of my O(p) Universe-share to save others. As Ryan says in the sister comment, that should be enough imo.
For the AIs to run sims themselves, the real cost is not running the sims, but paying the cooperating AIs in the sims and outbidding humanity. Given that some humans are willing to pay a lot more than one planet to save Earth, while an AI with linear utility wouldn’t make this deal, I think it will be hard to outbid us.