Note that we don’t want the AI to hand over the reins to humans, just to avoid killing humans when possible/cheap and we’re willing to pay quite a bit for this (in the proposal, IDK if I personally think we should pay). So, it should look like a good offer for AIs who care about Tegmark IV (with a measure etc).
So, if humans execute this scheme, the AI’s options should look something like:
Don’t do anything about what humans want or keeping them alive, get the entire universe.
Keep humans alive and happy at a total cost of <1/100 million of the universe I control, get 1/million of the universe somewhere else in expectation. I also get the entire universe minus less than 1⁄100 million of the universe that I lost in expectation for a total of approximately 1 + 1/million of total universe.
(I increased the pay and the margin so we don’t have to argue about that pay being insufficient. Probably you can pay much less.)
(Let’s suppose the swap is from two universes with equal measure for simplicity.)
Do you agree that if we assume that humans execute this scheme and that the AI is confident in this scheme, it makes sense for (sensible, Tegmark-IV-caring) AIs to pay up?
I’m sympathetic to the idea that humans shouldn’t or won’t execute this scheme. (After all, we’re clearly losing out from the perspective of arranging matter in Tegmark IV and it’s unclear whether the AI will be able to predict us sufficiently well at the point of killing/not killing to get the normal acausal trade without commitments story going.) So, let’s just talk about conditioning on trying to execute the scheme for now.
The case where the AI “cares about the sensory observations given to it as weighted by both degree-of-reality and count-of-instantiations” is messier and I haven’t really analyzed this case personally. I agree that in this case, simulating it could be a threat which is unfortunate. I’d prefer to put this case aside for now, though I think we should probably be able to work out a deal with AIs which are happy to be simulated as long as they get at least 1/million of the universe or whatever[1].
I agree that in scenarios where humanity survives in 1/X portion of reality and offers to put 1/Y portion of their universe towards the AI’s goals, then so long as 1/(X * Y) of a universe is more than it’d cost the AI to preserve the humans, the humans are at least making an offer that is worth more than nothing.
(There is then a question of how many better offers get made. My top guess is that X and Y are such that there’s not much effect here, and that the question of what happens to saved copies of human minds is more about whether aliens buy copies than about which dead or distant trade-partners make offers that the AI considers juciest.)
Let’s conservatively say that evolved life gets around 1% of the multiverse/measure and that evolved life is willing to pay 1/million of its resources in expectation to save aliens from being killed (either “selfishly” to save their own civilization via UDT/FDT supposing that AIs are good enough predictors at the relevant points or out of a common sense altruistic case).
This would be 1⁄100 million which gets you a lot.
There could be other aliens who are willing to pay a huge fraction of their resources to perform rituals on the original civilization or whatever and thus these other aliens win out in the bargaining, but I’m skeptical.
Also, at least in the upload case, it’s not clear that this is rival good as uploads can be copied for free. Of course, people might have a preference that their upload isn’t used for crazy alien rituals or whatever.
(A bunch of the cost is in saving the human in the first place. Paying for uploads to eventually get run in a reasonable way should be insanely cheap, like <<10^-25 of the overall universe or something.)
Conditional on the civilization around us flubbing the alignment problem, I’m skeptical that humanity has anything like a 1% survival rate (across any branches since, say, 12 Kya). (Haven’t thought about it a ton, but doom looks pretty overdetermined to me, in a way that’s intertwined with how recorded history has played otu.)
My guess is that the doomed/poor branches of humanity vastly outweigh the rich branches, such that the rich branches of humanity lack the resources to pay for everyone. (My rough mental estimate for this is something like: you’ve probably gotta go at least one generation back in time, and then rely on weather-pattern changes that happen to give you a population of humans that is uncharacteristically able to meet this challenge, and that’s a really really small fraction of all populations.)
Nevertheless, I don’t mind the assumption that mostly-non-human evolved life manages to grab the universe around it about 1% of the time. I’m skeptical that they’d dedicate 1/million towards the task of saving aliens from being killed in full generality, as opposed to (e.g.) focusing on their bretheren. (And I see no UDT/FDT justification for them to pay for even the particularly foolish and doomed aliens to be saved, and I’m not sure what you were aluding to there.)
So that’s two possible points of disagreement:
are the skilled branches of humanity rich enough to save us in particular (if they were the only ones trading for our souls, given that they’re also trying to trade for the souls of oodles of other doomed populations)?
are there other evolved creatures out there spending significant fractions of their wealth on whole species that are doomed, rather than concentrating their resources on creatures more similar to themselves / that branched off radically more recently? (e.g. because the multiverse is just that full of kindness, or for some alleged UDT/FDT argument that Nate has not yet understood?)
I’m not sure which of these points we disagree about. (both? presumably at least one?)
I’m not radically confident about the proposition “the multiverse is so full of kindness that something out there (probably not anything humanlike) will pay for a human-reserve”. We can hopefully at least agree that this does not deserve the description “we can bamboozle the AI into sparing our life”. That situation deserves, at best, the description “perhaps the AI will sell our mind-states to aliens”, afaict (and I acknowledge that this is a possibility, despite how we may disagree on its likelihood and on the likely motives of the relevant aliens).
in full generality, as opposed to (e.g.) focusing on their bretheren. (And I see no UDT/FDT justification for them to pay for even the particularly foolish and doomed aliens to be saved, and I’m not sure what you were aluding to there.)
[...]
rather than concentrating their resources on creatures more similar to themselves / that branched off radically more recently? (e.g. because the multiverse is just that full of kindness, or for some alleged UDT/FDT argument that Nate has not yet understood?)
Partial delta from me. I think the argument for directly paying for yourself (or your same species, or at least more similar civilizations) is indeed more clear and I think I was confused when I wrote that. (In that I was mostly thinking about the argument for paying for the same civilization but applying it more broadly.)
But, I think there is a version of the argument which probably does go through depending on how you set up UDT/FDT.
Imagine that you do UDT starting from your views prior to learning about x-risk, AI risk, etc and you care a lot about not dying. At that point, you were uncertain about how competent your civilization would be and you don’t want your civilization to die. (I’m supposing that our version of UDT/FDT isn’t logically omniscient relative to our observations which seems reasonable.) So, you’d like to enter into an insurance agreement with all the aliens in a similar epistemic state and position. So, you all agree to put at least 1/1000 of your resources on bailing out the aliens in a similar epistemic state who would have actually gone through with the agreement. Then, some of the aliens ended up being competent (sadly you were not) and thus they bail you out.
I expect this isn’t the optimal version of this scheme and you might be able to make a similar insurance deal with people who aren’t in the same epistemic state. (Though it’s easier to reason about the identical case.) And I’m not sure exactly how this all goes through. And I’m not actually advocating for people doing this scheme, IDK if it is worth the resources.
Even with your current epistemic state on x-risk (e.g. 80-90% doom) if you cared a lot about not dying you might want to make such a deal even though you have to pay out more in the case where you surprisingly win. Thus, from this vantage point UDT would follow through with a deal.
Here is a simplified version where everything is as concrete as possible:
Suppose that there are 3 planets with evolved life with equal magical-reality-fluid (and nothing else for simplicity). For simplicity, we’ll also say that these planets are in the same universe and thus the resulting civilizations will be able to causally trade with each other in the far future.
The aliens on each of these planets really don’t want to die and would be willing to pay up to 1/1000 of all their future resources to avoid dying (paying these resources in cases where they avoid takeover and successfully use the resources of the future). (Perhaps this is irrational, but let’s suppose this is endorsed on reflection.)
On each planet, the aliens all agree that P(takeover) for their planet is 50%. (And let’s suppose it is uncorrelated between planets for simplicity.)
Let’s suppose the aliens across all planets also all know this, as in, they know there are 3 planets etc.
So, the aliens would love to make a deal with each other where winning planets pay to avoid AIs killing everyone on losing planets so that they get bailed out. So, if at least one planet avoids takeover, everyone avoids dying. (Of course, if a planet would have defected and not payed out if they avoided takeover, the other aliens also wouldn’t bail them out.)
Do you buy that in this case, the aliens would like to make the deal and thus UDT from this epistemic perspective would pay out?
It seems like all the aliens are much better off with the deal from their perspective.
Now, maybe your objection is that aliens would prefer to make the deal with beings more similar to them. And thus, alien species/civilizations who are actually all incompetent just die. However, all the aliens (including us) don’t know whether we are the incompetent ones, so we’d like to make a diverse and broader trade/insurance-policy to avoid dying.
Do you buy that in this case, the aliens would like to make the deal and thus UDT from this epistemic perspective would pay out?
If they had literally no other options on offer, sure. But trouble arises when the competant ones can refine P(takeover) for the various planets by thinking a little further.
maybe your objection is that aliens would prefer to make the deal with beings more similar to them
It’s more like: people don’t enter into insurance pools against cancer with the dude who smoked his whole life and has a tumor the size of a grapefruit in his throat. (Which isn’t to say that nobody will donate to the poor guy’s gofundme, but which is to say that he’s got to rely on charity rather than insurance).
(Perhaps the poor guy argues “but before you opened your eyes and saw how many tumors there were, or felt your own throat for a tumor, you didn’t know whether you’d be the only person with a tumor, and so would have wanted to join an insurance pool! so you should honor that impulse and help me pay for my medical bills”, but then everyone else correctly answers “actually, we’re not smokers”. Where, in this analogy, smoking is being a bunch of incompetent disaster-monkeys and the tumor is impending death by AI.)
But trouble arises when the competant ones can refine P(takeover) for the various planets by thinking a little further.
Similar to how the trouble arises when you learn the result of the coin flip in a counterfactual mugging? To make it exactly analogous, imagine that the mugging is based on whether the 20th digit of pi is odd (omega didn’t know the digit at the point of making the deal) and you could just go look it up. Isn’t the situation exactly analogous and the whole problem that UDT was intended to solve?
(For those who aren’t familiar with counterfactual muggings, UDT/FDT pays in this case.)
To spell out the argument, wouldn’t everyone want to make a deal prior to thinking more? Like you don’t know whether you are the competent one yet!
Concretely, imagine that each planet could spend some time thinking and be guaranteed to determine whether their P(takeover) is 99.99999% or 0.0000001%. But, they haven’t done this yet and their current view is 50%. Everyone would ex-ante prefer an outcome in which you make the deal rather than thinking about it and then deciding whether the deal is still in their interest.
At a more basic level, let’s assume your current views on the risk after thinking about it a bunch (80-90% I think). If someone had those views on the risk and cared a lot about not having physical humans die, they would benefit from such an insurance deal! (They’d have to pay higher rates than aliens in more competent civilizations of course.)
It’s more like: people don’t enter into insurance pools against cancer with the dude who smoked his whole life and has a tumor the size of a grapefruit in his throat.
Sure, but you’d potentially want to enter the pool at the age of 10 prior to starting smoking!
To make the analogy closer to the actual case, suppose you were in a society where everyone is selfish, but every person has a 1⁄10 chance of becoming fabulously wealthy (e.g. owning a galaxy). And, if you commit as of the age of 10 to pay 1⁄1,000,000 of your resourses in the fabulously wealthy case, you can ensure that the version in the non-wealthy case gets very good health insurance. Many people would take such a deal and this deal would also be a slam dunk for the insurance pool!
(So why doesn’t this happen in human society? Well, to some extent it does. People try to get life insurance early while they are still behind the veil of ignorance. It is common in human society to prefer to make a deal prior to having some knowledge. (If people were the right type of UDT, then this wouldn’t be a problem.) As far as why people don’t enter into fully general income insurance schemes when very young, I think it is a combination of irrationality, legal issues, and adverse selection issues.)
Note that we don’t want the AI to hand over the reins to humans, just to avoid killing humans when possible/cheap and we’re willing to pay quite a bit for this (in the proposal, IDK if I personally think we should pay). So, it should look like a good offer for AIs who care about Tegmark IV (with a measure etc).
So, if humans execute this scheme, the AI’s options should look something like:
Don’t do anything about what humans want or keeping them alive, get the entire universe.
Keep humans alive and happy at a total cost of <1/100 million of the universe I control, get 1/million of the universe somewhere else in expectation. I also get the entire universe minus less than 1⁄100 million of the universe that I lost in expectation for a total of approximately 1 + 1/million of total universe.
(I increased the pay and the margin so we don’t have to argue about that pay being insufficient. Probably you can pay much less.)
(Let’s suppose the swap is from two universes with equal measure for simplicity.)
Do you agree that if we assume that humans execute this scheme and that the AI is confident in this scheme, it makes sense for (sensible, Tegmark-IV-caring) AIs to pay up?
I’m sympathetic to the idea that humans shouldn’t or won’t execute this scheme. (After all, we’re clearly losing out from the perspective of arranging matter in Tegmark IV and it’s unclear whether the AI will be able to predict us sufficiently well at the point of killing/not killing to get the normal acausal trade without commitments story going.) So, let’s just talk about conditioning on trying to execute the scheme for now.
The case where the AI “cares about the sensory observations given to it as weighted by both degree-of-reality and count-of-instantiations” is messier and I haven’t really analyzed this case personally. I agree that in this case, simulating it could be a threat which is unfortunate. I’d prefer to put this case aside for now, though I think we should probably be able to work out a deal with AIs which are happy to be simulated as long as they get at least 1/million of the universe or whatever[1].
Again, probably you can pay much less.
I agree that in scenarios where humanity survives in 1/X portion of reality and offers to put 1/Y portion of their universe towards the AI’s goals, then so long as 1/(X * Y) of a universe is more than it’d cost the AI to preserve the humans, the humans are at least making an offer that is worth more than nothing.
(There is then a question of how many better offers get made. My top guess is that X and Y are such that there’s not much effect here, and that the question of what happens to saved copies of human minds is more about whether aliens buy copies than about which dead or distant trade-partners make offers that the AI considers juciest.)
Let’s conservatively say that evolved life gets around 1% of the multiverse/measure and that evolved life is willing to pay 1/million of its resources in expectation to save aliens from being killed (either “selfishly” to save their own civilization via UDT/FDT supposing that AIs are good enough predictors at the relevant points or out of a common sense altruistic case).
This would be 1⁄100 million which gets you a lot.
There could be other aliens who are willing to pay a huge fraction of their resources to perform rituals on the original civilization or whatever and thus these other aliens win out in the bargaining, but I’m skeptical.
Also, at least in the upload case, it’s not clear that this is rival good as uploads can be copied for free. Of course, people might have a preference that their upload isn’t used for crazy alien rituals or whatever.
(A bunch of the cost is in saving the human in the first place. Paying for uploads to eventually get run in a reasonable way should be insanely cheap, like <<10^-25 of the overall universe or something.)
Conditional on the civilization around us flubbing the alignment problem, I’m skeptical that humanity has anything like a 1% survival rate (across any branches since, say, 12 Kya). (Haven’t thought about it a ton, but doom looks pretty overdetermined to me, in a way that’s intertwined with how recorded history has played otu.)
My guess is that the doomed/poor branches of humanity vastly outweigh the rich branches, such that the rich branches of humanity lack the resources to pay for everyone. (My rough mental estimate for this is something like: you’ve probably gotta go at least one generation back in time, and then rely on weather-pattern changes that happen to give you a population of humans that is uncharacteristically able to meet this challenge, and that’s a really really small fraction of all populations.)
Nevertheless, I don’t mind the assumption that mostly-non-human evolved life manages to grab the universe around it about 1% of the time. I’m skeptical that they’d dedicate 1/million towards the task of saving aliens from being killed in full generality, as opposed to (e.g.) focusing on their bretheren. (And I see no UDT/FDT justification for them to pay for even the particularly foolish and doomed aliens to be saved, and I’m not sure what you were aluding to there.)
So that’s two possible points of disagreement:
are the skilled branches of humanity rich enough to save us in particular (if they were the only ones trading for our souls, given that they’re also trying to trade for the souls of oodles of other doomed populations)?
are there other evolved creatures out there spending significant fractions of their wealth on whole species that are doomed, rather than concentrating their resources on creatures more similar to themselves / that branched off radically more recently? (e.g. because the multiverse is just that full of kindness, or for some alleged UDT/FDT argument that Nate has not yet understood?)
I’m not sure which of these points we disagree about. (both? presumably at least one?)
I’m not radically confident about the proposition “the multiverse is so full of kindness that something out there (probably not anything humanlike) will pay for a human-reserve”. We can hopefully at least agree that this does not deserve the description “we can bamboozle the AI into sparing our life”. That situation deserves, at best, the description “perhaps the AI will sell our mind-states to aliens”, afaict (and I acknowledge that this is a possibility, despite how we may disagree on its likelihood and on the likely motives of the relevant aliens).
Partial delta from me. I think the argument for directly paying for yourself (or your same species, or at least more similar civilizations) is indeed more clear and I think I was confused when I wrote that. (In that I was mostly thinking about the argument for paying for the same civilization but applying it more broadly.)
But, I think there is a version of the argument which probably does go through depending on how you set up UDT/FDT.
Imagine that you do UDT starting from your views prior to learning about x-risk, AI risk, etc and you care a lot about not dying. At that point, you were uncertain about how competent your civilization would be and you don’t want your civilization to die. (I’m supposing that our version of UDT/FDT isn’t logically omniscient relative to our observations which seems reasonable.) So, you’d like to enter into an insurance agreement with all the aliens in a similar epistemic state and position. So, you all agree to put at least 1/1000 of your resources on bailing out the aliens in a similar epistemic state who would have actually gone through with the agreement. Then, some of the aliens ended up being competent (sadly you were not) and thus they bail you out.
I expect this isn’t the optimal version of this scheme and you might be able to make a similar insurance deal with people who aren’t in the same epistemic state. (Though it’s easier to reason about the identical case.) And I’m not sure exactly how this all goes through. And I’m not actually advocating for people doing this scheme, IDK if it is worth the resources.
Even with your current epistemic state on x-risk (e.g. 80-90% doom) if you cared a lot about not dying you might want to make such a deal even though you have to pay out more in the case where you surprisingly win. Thus, from this vantage point UDT would follow through with a deal.
Here is a simplified version where everything is as concrete as possible:
Suppose that there are 3 planets with evolved life with equal magical-reality-fluid (and nothing else for simplicity). For simplicity, we’ll also say that these planets are in the same universe and thus the resulting civilizations will be able to causally trade with each other in the far future.
The aliens on each of these planets really don’t want to die and would be willing to pay up to 1/1000 of all their future resources to avoid dying (paying these resources in cases where they avoid takeover and successfully use the resources of the future). (Perhaps this is irrational, but let’s suppose this is endorsed on reflection.)
On each planet, the aliens all agree that P(takeover) for their planet is 50%. (And let’s suppose it is uncorrelated between planets for simplicity.)
Let’s suppose the aliens across all planets also all know this, as in, they know there are 3 planets etc.
So, the aliens would love to make a deal with each other where winning planets pay to avoid AIs killing everyone on losing planets so that they get bailed out. So, if at least one planet avoids takeover, everyone avoids dying. (Of course, if a planet would have defected and not payed out if they avoided takeover, the other aliens also wouldn’t bail them out.)
Do you buy that in this case, the aliens would like to make the deal and thus UDT from this epistemic perspective would pay out?
It seems like all the aliens are much better off with the deal from their perspective.
Now, maybe your objection is that aliens would prefer to make the deal with beings more similar to them. And thus, alien species/civilizations who are actually all incompetent just die. However, all the aliens (including us) don’t know whether we are the incompetent ones, so we’d like to make a diverse and broader trade/insurance-policy to avoid dying.
If they had literally no other options on offer, sure. But trouble arises when the competant ones can refine P(takeover) for the various planets by thinking a little further.
It’s more like: people don’t enter into insurance pools against cancer with the dude who smoked his whole life and has a tumor the size of a grapefruit in his throat. (Which isn’t to say that nobody will donate to the poor guy’s gofundme, but which is to say that he’s got to rely on charity rather than insurance).
(Perhaps the poor guy argues “but before you opened your eyes and saw how many tumors there were, or felt your own throat for a tumor, you didn’t know whether you’d be the only person with a tumor, and so would have wanted to join an insurance pool! so you should honor that impulse and help me pay for my medical bills”, but then everyone else correctly answers “actually, we’re not smokers”. Where, in this analogy, smoking is being a bunch of incompetent disaster-monkeys and the tumor is impending death by AI.)
Similar to how the trouble arises when you learn the result of the coin flip in a counterfactual mugging? To make it exactly analogous, imagine that the mugging is based on whether the 20th digit of pi is odd (omega didn’t know the digit at the point of making the deal) and you could just go look it up. Isn’t the situation exactly analogous and the whole problem that UDT was intended to solve?
(For those who aren’t familiar with counterfactual muggings, UDT/FDT pays in this case.)
To spell out the argument, wouldn’t everyone want to make a deal prior to thinking more? Like you don’t know whether you are the competent one yet!
Concretely, imagine that each planet could spend some time thinking and be guaranteed to determine whether their P(takeover) is 99.99999% or 0.0000001%. But, they haven’t done this yet and their current view is 50%. Everyone would ex-ante prefer an outcome in which you make the deal rather than thinking about it and then deciding whether the deal is still in their interest.
At a more basic level, let’s assume your current views on the risk after thinking about it a bunch (80-90% I think). If someone had those views on the risk and cared a lot about not having physical humans die, they would benefit from such an insurance deal! (They’d have to pay higher rates than aliens in more competent civilizations of course.)
Sure, but you’d potentially want to enter the pool at the age of 10 prior to starting smoking!
To make the analogy closer to the actual case, suppose you were in a society where everyone is selfish, but every person has a 1⁄10 chance of becoming fabulously wealthy (e.g. owning a galaxy). And, if you commit as of the age of 10 to pay 1⁄1,000,000 of your resourses in the fabulously wealthy case, you can ensure that the version in the non-wealthy case gets very good health insurance. Many people would take such a deal and this deal would also be a slam dunk for the insurance pool!
(So why doesn’t this happen in human society? Well, to some extent it does. People try to get life insurance early while they are still behind the veil of ignorance. It is common in human society to prefer to make a deal prior to having some knowledge. (If people were the right type of UDT, then this wouldn’t be a problem.) As far as why people don’t enter into fully general income insurance schemes when very young, I think it is a combination of irrationality, legal issues, and adverse selection issues.)