This proposal seems to me to have the form “the fragments of humanity that survive offer to spend a (larger) fraction of their universe on the AI’s goals so long as the AI spends a (smaller) fraction of its universe on their goals, with the ratio in accordance to the degree of magical-reality-fluid-or-whatever that reality allots to each”.
(Note that I think this is not at all “bamboozling” an AI; the parts of your proposal that are about bamboozling it seem to me to be either wrong or not doing any work. For instance, I think the fact that you’re doing simulations doesn’t do any work, and the count of simulations doesn’t do any work, for reasons I discuss in my original comment.)
The basic question here is whether the surviving branches of humanity have enough resources to make this deal worth the AI’s while.
You touch upon some of these counterarguments in your post—it seems to me after skimming a bit more, noting that I may still be making reading comprehension failures—albeit not terribly compellingly, so I’ll reiterate a few of them.
The basic obstacles are
the branches where the same humans survive are probably quite narrow (conditional on them being the sort to flub the alignment challenge). I can’t tell whether you agree with this point or not, in your response to point 1 in the “Nate’s arguments” section; it seems to me like you either misunderstood what the 2−75 was doing there or you asserted “I think that alignment will be so close a call that it could go either way according to the minute positioning of ~75 atoms at the last minute”, without further argument (seems wacky to me).
the branches where other humans survive (e.g. a branch that split off a couple generations ago and got particularly lucky with its individuals) have loads and loads of “lost populations” to worry about and don’t have a ton of change to spare for us in particular
there are competing offers we have to beat (e.g., there are other AIs in other doomed Everett branches that are like “I happen to be willing to turn my last two planets into paperclips if you’ll turn your last one planet into staples (and my branch is thicker than that one human branch who wants you to save them-in-particular)”.
(Note that, contra your “too many simulators” point, the other offers are probably not mostly coming from simulators.)
Once those factors are taken into account, I suspect that, if surviving-branches are able to pay the costs at all, the costs look a lot like paying almost all their resources, and I suspect that those costs aren’t worth paying at the given exchange rates.
All that said, I’m fine with stripping out discussion of “bamboozling” and of “simulation” and just flat-out asking: will the surviving branches of humanity (near or distant), or other kind civilizations throughout the multiverse, have enough resources on offer to pay for a human reserve here?
On that topic, I’m skeptical that those trades form a bigger contribution to our anticipations than local aliens being sold copies of our brainstates. Even insofar as the distant trade-partners win out over the local ones, my top guess is that the things who win the bid for us are less like our surviving Everett-twins and more like some alien coalition of kind distant trade partners.
Thus, “The AIs will kill us all (with the caveat that perhaps there’s exotic scenarios where aliens pay for our brain-states, and hopefully mostly do nice things with them)” seems to me like a fair summary of the situation at hand. Summarizing “we can, in fact, bamboozle an AI into sparing our life” does not seem like a fair summary to me. We would not be doing any bamboozling. We probably even wouldn’t be doing the trading. Some other aliens might pay for something to happen to our mind-states. (And insofar as they were doing it out of sheer kindness, rather than in pursuit of other alien ends where we end up twisted according to how they prefer creatures to be, this would come at a commensurate cost of nice things elsewhere in the multiverse.)
will the surviving branches of humanity (near or distant), or other kind civilizations throughout the multiverse, have enough resources on offer to pay for a human reserve here?
I think I still don’t understand what 2^-75 means. Is this the probability that in the literal last minute when we press the button, we get an aligned AI? I agree that things are grossly overdetermined by then, but why does the last minute mattter? I’m probably misunderstanding, but it looks like you are saying that the Everett branches are only “us” if they branched of in the literal last minute, otherwise you talk about them as if they were “other humans”. But among the branches starting now, there will be a person carrying my memories and ID card in most of them two years from now, and by most definitions of “me”, that person will be “me”, and will be motivated to save the other “me”s. And sure, they have loads of failed Everett branches to save, but they also have loads of Everett branches themselves, the only thing that matters is the ratio of saved worlds to failed worlds that contain roughly the “same” people as us. So I still don’t know what 2^-75 is supposed to be.
Otherwise, I largely agree with your comment, except that I think that us deciding to pay if we win is entangled with/evidence for a general willingness to pay among the gods, and in that sense it’s partially “our” decision doing the work of saving us. And as I said in some other comments here, I agree that running lots of sims is an unnecessary complication in case of UDT expected utility maximizer AIs, but I put a decent chance on the first AIs not being like that, in which case actually running the sims can be important.
There’s a question of how thick the Everett branches are, where someone is willing to pay for us. Towards one extreme, you have the literal people who literally died, before they have branched much; these branches need to happen close to the last minute. Towards the other extreme, you have all evolved life, some fraction of which you might imagine might care to pay for any other evolved species.
The problem with expecting folks at the first extreme to pay for you is that they’re almost all dead (like 1−2−a lot dead). The problem with expecting folks at the second extreme to pay for you is that they’ve got rather a lot of fools to pay for (like 2a lot of fools). As you interpolate between the extremes, you interpolate between the problems.
The “75” number in particular is the threshold where you can’t spend your entire universe in exchange for a star.
We are currently uncertain about whether Earth is doomed. As a simple example, perhaps you’re 50⁄50 on whether humanity is up to the task of solving the alignment problem, because you can’t yet distinguish between the hypothesis “the underlying facts of computer science are such that civilization can just bumble its way into AI alignment” and “the underlying facts of computer science are such that civilization is nowhere near up to this task”. In that case, the question is, conditional on the last hypothesis being true, how far back in the timeline do you have to go before you can flip only 75 quantum bits and have a civilization that is up to the task?
And how many fools does that surviving branch have to save?
I think that there is a way to compensate for this effect.
To illustrate compensation, consider the following experiment: Imagine that I want to resurrect a particular human by creating a quantum random file. This seems absurd as there is only 2−a lot chance that I create the right person. However, there are around d 2a lot copies of me in different branches who perform similar experiments, so in total, any resurrection attempt will create around 1 correct copy, but in a different branch. If we agree to trade resurrections between branches, every possible person will be resurrected in some branch.
Here, it means that we can ignore worries that we create a model of the wrong AI or that AI creates a wrong model of us, because a wrong model of us will be a real model of someone else, and someone else’s wrong model will be a correct model of us.
Thus, we can ignore all branching counting at first approximation, and instead count only the probability that Aligned AI will be created. It is reasonable to estimate it as 10 percent, plus or minus an order of magnitude.
In that case, we need to trade with non-aligned AI by giving 10 planets of paperclips for each planet with humans.
Towards one extreme, you have the literal people who literally died, before they have branched much; these branches need to happen close to the last minute.
By “last minute”, you mean “after I existed” right? So, e.g., if I care about genetic copies, that would be after I am born and if I care about contingent life experiences, that could be after I turned 16 or something. This seems to leave many years, maybe over a decade for most people.
I think David was confused by the “last minute language” which is really many years right? (I think you meant “last minute on evolutionary time scales, but not literally in the last few minutes”.)
That said, I’m generally super unconfident about how much a quantum bit changes things.
“last minute” was intended to reference whatever timescale David would think was the relevant point of branch-off. (I don’t know where he’d think it goes; there’s a tradeoff where the later you push it the more that the people on the surviving branch care about you rather than about some other doomed population, and the earlier you push it the more that the people on the surviving branch have loads and loads of doomed populations to care after.)
I chose the phrase “last minute” because it is an idiom that is ambiguous over timescales (unlike, say, “last three years”) and because it’s the longer of the two that sprung to mind (compared to “last second”), with perhaps some additional influence from the fact that David had spent a bunch of time arguing about how we would be saved (rather than arguing that someone in the multiverse might pay for some branches of human civilization to be saved, probably not us), which seemed to me to imply that he was imagining a branchpoint very close to the end (given how rapidly people dissasociate from alternate versions of them on other Everett branches).
Yeah, the misunderstanding came from that I thought that “last minute” literally means “last 60 seconds” and I didn’t see how that’s relevant. If if means “last 5 years” or something where it’s still definitely our genetic copies running around, then I’m surprised you think alignment success or failure is that overdetermined at that time-scale. I understand your point that our epistemic uncertainty is not the same as our actual quantum probability, that is either very high or very low. But still, it’s 2^75 overdetermined over a 5 year period? This sounds very surprising to me, the world feels more chaotic than that. (Taiwan gets nuked, chip development halts, meanwhile the Salvadorian president hears a good pitch about designer babies and legalizes running the experiments there and they work, etc, there are many things that contribute to alignment being solved or not, that don’t directly run through underlying facts about computer science, and 2^-75 is a very low probability to none of the pathways to hit it).
But also, I think I’m confused why you work on AI safety then, if you believe the end-state is already 2^75 level overdetermined. Like maybe working on earning to give to bednets would be a better use of your time then. And if you say “yes, my causal impact is very low because the end result is already overdetermined, but my actions are logically correlated with the actions of people in other worlds who are in a similar epistemic situation to me, but whose actions actually matter because their world really is on the edge”, then I don’t understand why you argue in other comments that we can’t enter into insurance contracts with those people, and our decision to pay AIs in the Future has as little correlation with their decision, as the child to the fireman.
I think I’m confused why you work on AI safety then, if you believe the end-state is already 2^75 level overdetermined.
It’s probably physically overdetermined one way or another, but we’re not sure which way yet. We’re still unsure about things like “how sensitive is the population to argument” and “how sensibly do government respond if the population shifts”.
But this uncertainty—about which way things are overdetermined by the laws of physics—does not bear all that much relationship to the expected ratio of (squared) quantum amplitude between branches where we live and branches where we die. It just wouldn’t be that shocking for the ratio between those two sorts of branches to be on the order of 2^75; this would correspond to saying something like “it turns out we weren’t just a few epileptic seizures and a well-placed thunderstorm away from the other outcome”.
As I said, I understand the difference between epictemic uncertainty and true quantum probabilities, though I do think that the true quantum probability is not that astronomically low.
More importantly, I still feel confused why you are working on AI safety if the outcome is that overdetermined one way or the other.
What does degree of determination have to do with it? If you lived in a fully deterministic universe, and you were uncertain whether it was going to live or die, would you give up on it on the mere grounds that the answer is deterministic (despite your own uncertainty about which answer is physically determined)?
I still think I’m right about this. Your conception (that not a genetically less smart sibling was born), is determined by quantum fluctuations. So if you believe that quantum fluctuations over the last 50 years make at most 2^-75 difference in the probability of alignment, that’s an upper bound on how much a difference your life’s work can make. While if you dedicate your life to buying bednets, it’s pretty easily calculatable how many happy life-years do you save. So I still think it’s incompatible to believe that the true quantum probability is astronomically low, but you can make enough difference that working on AI safety is clearly better than bednets.
the “you can’t save us by flipping 75 bits” thing seems much more likely to me on a timescale of years than a timescale of decades; I’m fairly confident that quantum fluctuations can cause different people to be born, and so if you’re looking 50 years back you can reroll the population dice.
This point feels like a technicality, but I want to debate it because I think a fair number of your other claims depend on it.
You often claim that conditional on us failing in alignment, alignment was so unlikely that among branches that had roughyly the same people (genetically) during the Singularity, only 2^-75 survives. This is important, because then we can’t rely on other versions of ourselves “selfishly” entering an insurance contract with us, and we need to rely on the charity of Dath Ilan that branched off long ago. I agree that’s a big difference. Also, I say that our decision to pay is correlated with our luckier brethren paying, so in a sense partially our decision is the thing that saves us. You dismiss that saying it’s like a small child claiming credit for the big, strong fireman saving people. If it’s Dath Ilan that saves us, I agree with you, but if it’s genetical copies of some currently existing people, I think your metaphor pretty clearly doesn’t apply, and the decisions to pay are in fact decently strongly correlated.
Now I don’t see how much difference decades vs years makes in this framework. If you believe that now our true quantum probabilty is 2^-75, but 40 years ago it was still a not-astronomical number (like 1 in a million), then should I just plea to people who are older than 40 to promise to themselves they will pay in the future? I don’t really see what difference this makes.
But also, I think the years vs decades dichtihomy is pretty clearly false. Suppoose you believe your expected value of one year of work decreases x-risk by X. What’s the yearly true quantum probability that someone who is in your reference class of importance in your opinion, dies or gets a debilitating interest, or gets into a carreer-destroying scandal, etc? I think it’s hard to argue it’s less than 0.1% a year. (But it makes no big difference if you add one or two zeros). These things are also continuous, even if none of the important people die, someone will lose a month or some weeks to an illness, etc. I think this is a pretty strong case that the one year from now, the 90th percentile luckiest Everett-branch contains 0.01 year of the equivalent of Nate-work than the 50th percentile Everett-branch.
But your claims imply that you believe the true probability of success differs by less than 2^-72 between the 50th and 90th percentile luckiness branches a year from now. That puts an upper bound on the value of a year of your labor at 2^-62 probability decrease in x-risk.
With these exact numbers, this can be still worth doing given the astronomical stakes, but if your made-up number was 2^-100 instead, I think it would be better for you to work on malaria.
Here is another more narrow way to put this argument:
Let’s say Nate is 35 (arbitrary guess).
Let’s say that branches which deviated 35 years ago would pay for our branch (and other branches in our reference class). The case for this is that many people are over 50 (thus existing in both branches), and care about deviated versions of themselves and their children etc. Probably the discount relative to zero deviation is less than 10x.
Let’s say that Nate thinks that if he didn’t ever exist, P(takeover) would go up by 1 / 10 billion (roughly 2^-32). If it was wildly lower than this, that would be somewhat surprising and might suggest different actions.
Nate existing is sensitive to a bit of quantum randomness 35 years ago, so other people as good as Nate existing could be created with a bit of quantum randomness. So, 1 bit of randomness can reduce risk by at least 1 / 10 billion.
Thus, 75 bits of randomness presumably reduces risk by > 1 / 10 billion which is >> 2^-75.
(This argument is a bit messy because presumably some logical facts imply that Nate will be very helpful and some imply that he won’t be very helpful and I was taking an expectation over this while we really care about the effect on all the quantum branches. I’m not sure exactly how to make the argument exactly right, but at least I think it is roughly right.)
What about these case where we only go back 10 years? We can apply the same argument, but instead just use some number of bits (e.g. 10) to make Nate work a bit more, say 1 week of additional work via changing whether Nate ends up getting sick (by adjusting the weather or which children are born, or whatever). This should also reduce doom by 1 week / (52 weeks/year) / (20 years/duration of work) * 1 / 10 billion = 1 / 10 trillion.
And surely there are more efficient schemes.
To be clear, only having ~ 1 / 10 billion branches survive is rough from a trade perspective.
What are you trying to argue? (I don’t currently know what position y’all think I have or what position you’re arguing for. Taking a shot in the dark: I agree that quantum bitflips have loads more influence on the outcome the earlier in time they are.)
I argue that right now, sarting from the present state, the true quantum probability of achieving the Glorious Future is way higher than 2^-75, or if not, then we should probably work on something other than AI safety. Me and Ryan argue for this in the last few comments. It’s not a terribly important point, you can just say the true quantum probability is 1 in a billion, when it’s still worth it for you to work on the problem, but it becomes rough to trade for keeping humanity physically alive that can cause one year of delay to the AI.
But I would like you to acknowledge that “vastly below 2^-75 true quantum probability, as starting from now” is probably mistaken, or explain why our logic is wrong about how this implies you should work on malaria.
Starting from now? I agree that that’s true in some worlds that I consider plausible, at least, and I agree that worlds whose survival-probabilities are sensitive to my choices are the ones that render my choices meaningful (regardless of how determinisic they are).
Conditional on Earth being utterly doomed, are we (today) fewer than 75 qbitflips from being in a good state? I’m not sure, it probably varies across the doomed worlds where I have decent amounts of subjective probability. It depends how much time we have on the clock, depends where the points of no-return are. I haven’t thought about this a ton. My best guess is it would take more than 75 qbitflips to save us now, but maybe I’m not thinking creatively enough about how to spend them, and I haven’t thought about it in detail and expect I’d be sensitive to argument about it /shrug.
(If you start from 50 years ago? Very likely! 75 bits is a lot of population rerolls. If you start after people hear the thunder of the self-replicating factories barrelling towards them, and wait until the very last moments that they would consider becoming a distinct person who is about to die from AI, and who wishes to draw upon your reassurance that they will be saved? Very likely not! Those people look very, very dead.)
One possible point of miscommunication is that when I said something like “obviously it’s worse than 2^-75 at the extreme where it’s actually them who is supposed to survive” was intended to apply to the sort of person who has seen the skies darken and has heard the thunder, rather than the version of them that exists here in 2024. This was not intended to be some bold or suprising claim. It was an attempt to establish an obvious basepoint at one very extreme end of a spectrum, that we could start interpolating from (asking questions like “how far back from there are the points of no return?” and “how much more entropy would they have than god, if people from that branchpoint spent stars trying to figure out what happened after those points?”).
(The 2^-75 was not intended to be even an esitmate of how dead the people on the one end of the extreme are. It is the “can you buy a star” threshold. I was trying to say something like “the individuals who actually die obviously can’t buy themselves a star just because they inhabit Tegmark III, now let’s drag the cursor backwards and talk about whether, at any point, we cross the a-star-for-everyone threshold”.)
If that doesn’t clear things up and you really want to argue that, conditional on Earth being as doomed as it superficially looks to me, most of those worlds are obviously <100 quantum bitflips from victory today, I’m willing to field those arguments; maybe you see some clever use of qbitflips I don’t and that would be kinda cool. But I caveat that this doesn’t seem like a crux to me and that I acknowledge that the other worlds (where Earth merely looks unsavlageable) are the ones motivating action.
I have not followed this thread in all of its detail, but it sounds like it might be getting caught up on the difference between the underlying ratio of different quantum worlds (which can be expressed as a probability over one’s future) and one’s probabilistic uncertainty over the underlying ratio of different quantum worlds (which can also be expressed as a probability over the future but does not seem to me to have the same implications for behavior).
Insofar as it seems to readers like a bad idea to optimize for different outcomes in a deterministic universe, I recommend reading the Free Will (Solution) sequence by Eliezer Yudkowsky, which I found fairly convincing on the matter of why it’s still right to optimize in a fully deterministic universe, as well as in a universe running on quantum mechanics (interpreted to have many worlds).
You often claim that conditional on us failing in alignment, alignment was so unlikely that among branches that had roughyly the same people (genetically) during the Singularity, only 2^-75 survives.
My first claim is not “fewer than 1 in 2^75 of the possible configurations of human populations navigate the problem successfully”.
My first claim is more like “given a population of humans that doesn’t even come close to navigating the problem successfully (given some unoptimized configuration of the background particles), probably you’d need to spend quite a lot of bits of optimization to tune the butterfly-effects in the background particles to make that same population instead solve alignment (depending how far back in time you go).” (A very rough rule of thumb here might be “it should take about as many bits as it takes to specify an FAI (relative to what they know)”.)
This is especially stark if you’re trying to find a branch of reality that survives with the “same people” on it. Humans seem to be very, very sensitive about what counts as the “same people”. (e.g., in August, when gambling on who gets a treat, I observed a friend toss a quantum coin, see it come up against them, and mourn that a different person—not them—would get to eat the treat.)
(Insofar as y’all are trying to argue “those MIRI folk say that AI will kill you, but actually, a person somewhere else in the great quantum multiverse, who has the same genes and childhood as you but whose path split off many years ago, will wake up in a simulation chamber and be told that they were rescued by the charity of aliens! So it’s not like you’ll really die”, then I at least concede that that’s an easier case to make, although it doesn’t feel like a very honest presentation to me.)
Conditional on observing a given population of humans coming nowhere close to solving the problem, the branches wherein those humans live (with identity measured according to the humans) are probably very extremely narrow compared to the versions where they die. My top guess would be that 2^-75 number is a vast overestimate of how thick those branches are (and the 75 in the exponent does not come from any attempt of mine to make that estimate).
As I said earlier: you can take branches that branched off earlier and earlier in time, and they’ll get better and better odds. (Probably pretty drastically, as you back off past certain points of no return. I dunno where the points of no return are. Weeks? Months? Years? Not decades, because with decades you can reroll significant portions of the population.)
I haven’t thought much about what fraction of populations I’d expect to survive off of what branch-point. (How many bits of optimization do you need back in the 1880s to swap Hitler out for some charismatic science-enthusiast statesman that will happen to have exactly the right infulence on the following culture? How many such routes are there? I have no idea.)
Three big (related) issues with hoping that forks branced off sufficiently early (who are more numerous) save us in particular (rather than other branches) are (a) they plausibly care more about populations nearer to them (e.g. versions of themselves that almost died); (b) insofar as they care about more distant populations (that e.g. include you), they have rather a lot of distant populations to attempt to save; and (c) they have trouble distinguishing populations that never were, from populations that were and then weren’t.
Point (c) might be a key part of the story, not previously articulated (that I recall), that you were missing?
Like, you might say “well, if one in a billion branches look like dath ilan and the rest look like earth, and the former basically all survive and the latter basically all die, then the fact that the earthlike branches have ~0 ability to save their earthlike kin doesn’t matter, so long as the dath-ilan like branches are trying to save everyone. dath ilan can just flip 30 quantum coins to select a single civilization from among the billion that died, and then spend 1/million resources on simulating that civilization (or paying off their murderer or whatever), and that still leaves us with one-in-a-quintillion fraction of the universe, which is enough to keep the lights running”.
Part of the issue with this is that dath ilan cannot simply sample from the space of dead civilizations; it has to sample from a space of plausible dead civilizations rather than actual dead civilizations, in a way that I expect to smear loads and loads of probability-mass over regions that had concentrated (but complex) patterns of amplitude. The concentrations of Everett branches are like a bunch of wiggly thin curves etched all over a disk, and it’s not too hard to sample uniformly from the disk (and draw a plausible curve that the point could have been on), but it’s much harder to sample only from the curves. (Or, at least, so the physics looks to me. And this seems like a common phenomenon in physics. c.f. the apparent inevitable increase of entropy when what’s actually happening is a previously-compact volume in phase space evolving int oa bunch of wiggly thin curves, etc.)
So when you’re considering whether surviving humans will pay for our souls—not somebody’s souls, but our souls in particular—you have a question of how these alleged survivors came to pay for us in particular (rather than some other poor fools). And there’s a tradeoff that runs on one exrteme from “they’re saving us because they are almost exactly us and they remember us and wish us to have a nice epilog” all the way to “they’re some sort of distant cousins, branched off a really long time ago, who are trying to save everyone”.
The problem with being on the “they care about us because they consider they basically are us” end is that those people are dead to (conditional on us being dead). And as you push the branch-point earlier and earlier in time, you start finding more survivors, but those survivors also wind up having more and more fools to care about (in part because they have trouble distinguishing the real fallen civilizations from the neighboring civilization-configurations that don’t get appreciable quantum amplitude in basement physics).
If you tell me where on this tradeoff curve you want to be, we can talk about it. (Ryan seemed to want to look all the way on the “insurance pool with aliens” end of the spectrum.)
The point of the 2^75 number is that that’s about the threshold of “can you purchase a single star”. My guess is that, conditional on people dying, versions that they consider also them survive with degree way less than 2^-75, which rules out us being the ones who save us.
If we retreat to “distant cousin branches of humanity might save us”, there’s a separate question of how the width of the surviving quantum branch compares to the volume taken up by us in the space of civilizations they attempt to save. I think my top guess is that a distant branch of humanity, spending stellar-level resources in attempts to concentrate its probability-mass in accordance with how quantum physics concentrates (squared) amplitude, still winds up so uncertain that there’s still 50+ bits of freedom left over? Which means that if one-in-a-billion of our cousin-branches survives, they still can’t buy a star (unless I flubbed my math).
And I think it’s real, real easy for them to wind up with 1000 bits leftover, in which case their purchasing power is practically nothing.
(This actually seems like a super reasonable guess to me. Like, if you imagine knowing that a mole of gas was compressed into the corner of a box with known volume, and you then let the gas bounce around for 13 billion years and take some measurements of pressure and temperature, and then think long and hard using an amount of compute that’s appreciably less than the amount you’d need to just simulate the whole thing from the start. It seems to me like you wind up with a distribution that has way way more than 1000 bits more entropy than is contained in the underlying physics. Imagining that you can spend about 1 ten millionth of the universe on refining a distribution over Tegmark III with entropy that’s within 50 bits of god seems very very generous to me; I’m very uncertain about this stuff but I think that even mature superintelligences could easily wind up 1000 bits from god here.)
Regardless, as I mentioned elsewhere, I think that a more relevant question is how those trade-offers stack up to other trade-offers, so /shrug.
I understand what you are saying here, and I understood it before the comment thread started. The thing I would be interested in you responding to is my and Ryan’s comments in this thread arguing that it’s incompatible to believe that “My guess is that, conditional on people dying, versions that they consider also them survive with degree way less than 2^-75, which rules out us being the ones who save us” and to believe that you should work on AI safety instead of malaria.
Even if you think a life’s work can’t make a difference but many can, you can still think it’s worthwhile to work on alignment for whatever reasons make you think it’s worthwhile to do things like voting.
Not quite following—your possibilities. 1. Alignment is almost impossible, then there is say 1e-20 chance we survive. Yes surviving worlds have luck and good alignment work etc. Perhaps you should work on alignment or still bednets if the odds really are that low.
2. Alignment is easy by default, but there is nothing like 0.999999 we survive, say 95% because AGI that is not TAI superintelligence could cause us to wipe ourselves out first, among other things. (This is a slow takeoff universe(s))
#2 has much more branches in total where we survive (not sure if that matters) and the difference between where things go well and badly is almost all about stopping ourself killing ourselves with non TAI related things. In this situation, shouldn’t you be working on those things?
If you average 1,2 then you still get a lot of work on non-alignment related stuff.
I believe its somewhere closer to 50⁄50 and not so overdetermined one way or the other, but we are not considering that here.
I largely agree with your comment, except that I think that us deciding to pay if we win is entangled with/evidence for a general willingness to pay among the gods, and in that sense it’s partially “our” decision doing the work of saving us.
Sure, like how when a child sees a fireman pull a woman out of a burning building and says “if I were that big and strong, I would also pull people out of burning buildings”, in a sense it’s partially the child’s decsiion that does the work of saving the woman. (There’s maybe a little overlap in how they run the same decision procedure that’s coming to the same conclusion in both cases, but vanishingly little of the credit goes to the child.)
in which case actually running the sims can be important
In the case where the AI is optimizing reality-and-instantiation-weighted experience, you’re giving it a threat, and your plan fails on the grounds that sane reasoners ignore that sort of threat.
in the case where your plan is “I am hoping that the AI will be insane in some other unspecified but precise way which will make it act as I wish”, I don’t see how it’s any more helpful than the plan “I am hoping the AI will be aligned”—it seems to me that we have just about as much ability to hit either target.
when a child sees a fireman pull a woman out of a burning building and says “if I were that big and strong, I would also pull people out of burning buildings”, in a sense it’s partially the child’s decision that does the work of saving the woman… but vanishingly little of the credit goes to the child
The child is partly responsible—to a very small but nonzero degree—for the fireman’s actions, because the child’s personal decision procedure has some similarity to the fireman’s decision procedure?
Otherwise, I largely agree with your comment, except that I think that us deciding to pay if we win is entangled with/evidence for a general willingness to pay among the gods, and in that sense it’s partially “our” decision doing the work of saving us.
and was insinuating that we deserve extremely little credit for such a choice, in the same way that a child deserves extremely little credit for a fireman saving someone that the child could not (even if it’s true that the child and the fireman share some aspects of a decision procedure). My claim was intended less like agreement with David’s claim and more like reductio ad absurdum, with the degree of absurdity left slightly ambiguous.
(And on second thought, the analogy would perhaps have been tighter if the firefighter was saving the child.)
I think the common sense view is that this similarity of decision procedures provides exactly zero reason to credit the child with the fireman’s decisions. Credit for a decision goes to the agent who makes it, or perhaps to the algorithm that the agent used, but not to other agents running the same or similar algorithms.
Taking a second stab at naming the top reasons I expect this to fail (after Ryan pointed out that my first stab was based on a failure of reading comprehension on my part, thanks Ryan):
This proposal seems to me to have the form “the fragments of humanity that survive offer to spend a (larger) fraction of their universe on the AI’s goals so long as the AI spends a (smaller) fraction of its universe on their goals, with the ratio in accordance to the degree of magical-reality-fluid-or-whatever that reality allots to each”.
(Note that I think this is not at all “bamboozling” an AI; the parts of your proposal that are about bamboozling it seem to me to be either wrong or not doing any work. For instance, I think the fact that you’re doing simulations doesn’t do any work, and the count of simulations doesn’t do any work, for reasons I discuss in my original comment.)
The basic question here is whether the surviving branches of humanity have enough resources to make this deal worth the AI’s while.
You touch upon some of these counterarguments in your post—it seems to me after skimming a bit more, noting that I may still be making reading comprehension failures—albeit not terribly compellingly, so I’ll reiterate a few of them.
The basic obstacles are
the branches where the same humans survive are probably quite narrow (conditional on them being the sort to flub the alignment challenge). I can’t tell whether you agree with this point or not, in your response to point 1 in the “Nate’s arguments” section; it seems to me like you either misunderstood what the 2−75 was doing there or you asserted “I think that alignment will be so close a call that it could go either way according to the minute positioning of ~75 atoms at the last minute”, without further argument (seems wacky to me).
the branches where other humans survive (e.g. a branch that split off a couple generations ago and got particularly lucky with its individuals) have loads and loads of “lost populations” to worry about and don’t have a ton of change to spare for us in particular
there are competing offers we have to beat (e.g., there are other AIs in other doomed Everett branches that are like “I happen to be willing to turn my last two planets into paperclips if you’ll turn your last one planet into staples (and my branch is thicker than that one human branch who wants you to save them-in-particular)”.
(Note that, contra your “too many simulators” point, the other offers are probably not mostly coming from simulators.)
Once those factors are taken into account, I suspect that, if surviving-branches are able to pay the costs at all, the costs look a lot like paying almost all their resources, and I suspect that those costs aren’t worth paying at the given exchange rates.
All that said, I’m fine with stripping out discussion of “bamboozling” and of “simulation” and just flat-out asking: will the surviving branches of humanity (near or distant), or other kind civilizations throughout the multiverse, have enough resources on offer to pay for a human reserve here?
On that topic, I’m skeptical that those trades form a bigger contribution to our anticipations than local aliens being sold copies of our brainstates. Even insofar as the distant trade-partners win out over the local ones, my top guess is that the things who win the bid for us are less like our surviving Everett-twins and more like some alien coalition of kind distant trade partners.
Thus, “The AIs will kill us all (with the caveat that perhaps there’s exotic scenarios where aliens pay for our brain-states, and hopefully mostly do nice things with them)” seems to me like a fair summary of the situation at hand. Summarizing “we can, in fact, bamboozle an AI into sparing our life” does not seem like a fair summary to me. We would not be doing any bamboozling. We probably even wouldn’t be doing the trading. Some other aliens might pay for something to happen to our mind-states. (And insofar as they were doing it out of sheer kindness, rather than in pursuit of other alien ends where we end up twisted according to how they prefer creatures to be, this would come at a commensurate cost of nice things elsewhere in the multiverse.)
Nate and I discuss this question in this other thread for reference.
I think I still don’t understand what 2^-75 means. Is this the probability that in the literal last minute when we press the button, we get an aligned AI? I agree that things are grossly overdetermined by then, but why does the last minute mattter? I’m probably misunderstanding, but it looks like you are saying that the Everett branches are only “us” if they branched of in the literal last minute, otherwise you talk about them as if they were “other humans”. But among the branches starting now, there will be a person carrying my memories and ID card in most of them two years from now, and by most definitions of “me”, that person will be “me”, and will be motivated to save the other “me”s. And sure, they have loads of failed Everett branches to save, but they also have loads of Everett branches themselves, the only thing that matters is the ratio of saved worlds to failed worlds that contain roughly the “same” people as us. So I still don’t know what 2^-75 is supposed to be.
Otherwise, I largely agree with your comment, except that I think that us deciding to pay if we win is entangled with/evidence for a general willingness to pay among the gods, and in that sense it’s partially “our” decision doing the work of saving us. And as I said in some other comments here, I agree that running lots of sims is an unnecessary complication in case of UDT expected utility maximizer AIs, but I put a decent chance on the first AIs not being like that, in which case actually running the sims can be important.
There’s a question of how thick the Everett branches are, where someone is willing to pay for us. Towards one extreme, you have the literal people who literally died, before they have branched much; these branches need to happen close to the last minute. Towards the other extreme, you have all evolved life, some fraction of which you might imagine might care to pay for any other evolved species.
The problem with expecting folks at the first extreme to pay for you is that they’re almost all dead (like 1−2−a lot dead). The problem with expecting folks at the second extreme to pay for you is that they’ve got rather a lot of fools to pay for (like 2a lot of fools). As you interpolate between the extremes, you interpolate between the problems.
The “75” number in particular is the threshold where you can’t spend your entire universe in exchange for a star.
We are currently uncertain about whether Earth is doomed. As a simple example, perhaps you’re 50⁄50 on whether humanity is up to the task of solving the alignment problem, because you can’t yet distinguish between the hypothesis “the underlying facts of computer science are such that civilization can just bumble its way into AI alignment” and “the underlying facts of computer science are such that civilization is nowhere near up to this task”. In that case, the question is, conditional on the last hypothesis being true, how far back in the timeline do you have to go before you can flip only 75 quantum bits and have a civilization that is up to the task?
And how many fools does that surviving branch have to save?
I think that there is a way to compensate for this effect.
To illustrate compensation, consider the following experiment: Imagine that I want to resurrect a particular human by creating a quantum random file. This seems absurd as there is only 2−a lot chance that I create the right person. However, there are around d 2a lot copies of me in different branches who perform similar experiments, so in total, any resurrection attempt will create around 1 correct copy, but in a different branch. If we agree to trade resurrections between branches, every possible person will be resurrected in some branch.
Here, it means that we can ignore worries that we create a model of the wrong AI or that AI creates a wrong model of us, because a wrong model of us will be a real model of someone else, and someone else’s wrong model will be a correct model of us.
Thus, we can ignore all branching counting at first approximation, and instead count only the probability that Aligned AI will be created. It is reasonable to estimate it as 10 percent, plus or minus an order of magnitude.
In that case, we need to trade with non-aligned AI by giving 10 planets of paperclips for each planet with humans.
By “last minute”, you mean “after I existed” right? So, e.g., if I care about genetic copies, that would be after I am born and if I care about contingent life experiences, that could be after I turned 16 or something. This seems to leave many years, maybe over a decade for most people.
I think David was confused by the “last minute language” which is really many years right? (I think you meant “last minute on evolutionary time scales, but not literally in the last few minutes”.)
That said, I’m generally super unconfident about how much a quantum bit changes things.
“last minute” was intended to reference whatever timescale David would think was the relevant point of branch-off. (I don’t know where he’d think it goes; there’s a tradeoff where the later you push it the more that the people on the surviving branch care about you rather than about some other doomed population, and the earlier you push it the more that the people on the surviving branch have loads and loads of doomed populations to care after.)
I chose the phrase “last minute” because it is an idiom that is ambiguous over timescales (unlike, say, “last three years”) and because it’s the longer of the two that sprung to mind (compared to “last second”), with perhaps some additional influence from the fact that David had spent a bunch of time arguing about how we would be saved (rather than arguing that someone in the multiverse might pay for some branches of human civilization to be saved, probably not us), which seemed to me to imply that he was imagining a branchpoint very close to the end (given how rapidly people dissasociate from alternate versions of them on other Everett branches).
Yeah, the misunderstanding came from that I thought that “last minute” literally means “last 60 seconds” and I didn’t see how that’s relevant. If if means “last 5 years” or something where it’s still definitely our genetic copies running around, then I’m surprised you think alignment success or failure is that overdetermined at that time-scale. I understand your point that our epistemic uncertainty is not the same as our actual quantum probability, that is either very high or very low. But still, it’s 2^75 overdetermined over a 5 year period? This sounds very surprising to me, the world feels more chaotic than that. (Taiwan gets nuked, chip development halts, meanwhile the Salvadorian president hears a good pitch about designer babies and legalizes running the experiments there and they work, etc, there are many things that contribute to alignment being solved or not, that don’t directly run through underlying facts about computer science, and 2^-75 is a very low probability to none of the pathways to hit it).
But also, I think I’m confused why you work on AI safety then, if you believe the end-state is already 2^75 level overdetermined. Like maybe working on earning to give to bednets would be a better use of your time then. And if you say “yes, my causal impact is very low because the end result is already overdetermined, but my actions are logically correlated with the actions of people in other worlds who are in a similar epistemic situation to me, but whose actions actually matter because their world really is on the edge”, then I don’t understand why you argue in other comments that we can’t enter into insurance contracts with those people, and our decision to pay AIs in the Future has as little correlation with their decision, as the child to the fireman.
It’s probably physically overdetermined one way or another, but we’re not sure which way yet. We’re still unsure about things like “how sensitive is the population to argument” and “how sensibly do government respond if the population shifts”.
But this uncertainty—about which way things are overdetermined by the laws of physics—does not bear all that much relationship to the expected ratio of (squared) quantum amplitude between branches where we live and branches where we die. It just wouldn’t be that shocking for the ratio between those two sorts of branches to be on the order of 2^75; this would correspond to saying something like “it turns out we weren’t just a few epileptic seizures and a well-placed thunderstorm away from the other outcome”.
As I said, I understand the difference between epictemic uncertainty and true quantum probabilities, though I do think that the true quantum probability is not that astronomically low.
More importantly, I still feel confused why you are working on AI safety if the outcome is that overdetermined one way or the other.
What does degree of determination have to do with it? If you lived in a fully deterministic universe, and you were uncertain whether it was going to live or die, would you give up on it on the mere grounds that the answer is deterministic (despite your own uncertainty about which answer is physically determined)?
I still think I’m right about this. Your conception (that not a genetically less smart sibling was born), is determined by quantum fluctuations. So if you believe that quantum fluctuations over the last 50 years make at most 2^-75 difference in the probability of alignment, that’s an upper bound on how much a difference your life’s work can make. While if you dedicate your life to buying bednets, it’s pretty easily calculatable how many happy life-years do you save. So I still think it’s incompatible to believe that the true quantum probability is astronomically low, but you can make enough difference that working on AI safety is clearly better than bednets.
the “you can’t save us by flipping 75 bits” thing seems much more likely to me on a timescale of years than a timescale of decades; I’m fairly confident that quantum fluctuations can cause different people to be born, and so if you’re looking 50 years back you can reroll the population dice.
This point feels like a technicality, but I want to debate it because I think a fair number of your other claims depend on it.
You often claim that conditional on us failing in alignment, alignment was so unlikely that among branches that had roughyly the same people (genetically) during the Singularity, only 2^-75 survives. This is important, because then we can’t rely on other versions of ourselves “selfishly” entering an insurance contract with us, and we need to rely on the charity of Dath Ilan that branched off long ago. I agree that’s a big difference. Also, I say that our decision to pay is correlated with our luckier brethren paying, so in a sense partially our decision is the thing that saves us. You dismiss that saying it’s like a small child claiming credit for the big, strong fireman saving people. If it’s Dath Ilan that saves us, I agree with you, but if it’s genetical copies of some currently existing people, I think your metaphor pretty clearly doesn’t apply, and the decisions to pay are in fact decently strongly correlated.
Now I don’t see how much difference decades vs years makes in this framework. If you believe that now our true quantum probabilty is 2^-75, but 40 years ago it was still a not-astronomical number (like 1 in a million), then should I just plea to people who are older than 40 to promise to themselves they will pay in the future? I don’t really see what difference this makes.
But also, I think the years vs decades dichtihomy is pretty clearly false. Suppoose you believe your expected value of one year of work decreases x-risk by X. What’s the yearly true quantum probability that someone who is in your reference class of importance in your opinion, dies or gets a debilitating interest, or gets into a carreer-destroying scandal, etc? I think it’s hard to argue it’s less than 0.1% a year. (But it makes no big difference if you add one or two zeros). These things are also continuous, even if none of the important people die, someone will lose a month or some weeks to an illness, etc. I think this is a pretty strong case that the one year from now, the 90th percentile luckiest Everett-branch contains 0.01 year of the equivalent of Nate-work than the 50th percentile Everett-branch.
But your claims imply that you believe the true probability of success differs by less than 2^-72 between the 50th and 90th percentile luckiness branches a year from now. That puts an upper bound on the value of a year of your labor at 2^-62 probability decrease in x-risk.
With these exact numbers, this can be still worth doing given the astronomical stakes, but if your made-up number was 2^-100 instead, I think it would be better for you to work on malaria.
Here is another more narrow way to put this argument:
Let’s say Nate is 35 (arbitrary guess).
Let’s say that branches which deviated 35 years ago would pay for our branch (and other branches in our reference class). The case for this is that many people are over 50 (thus existing in both branches), and care about deviated versions of themselves and their children etc. Probably the discount relative to zero deviation is less than 10x.
Let’s say that Nate thinks that if he didn’t ever exist, P(takeover) would go up by 1 / 10 billion (roughly 2^-32). If it was wildly lower than this, that would be somewhat surprising and might suggest different actions.
Nate existing is sensitive to a bit of quantum randomness 35 years ago, so other people as good as Nate existing could be created with a bit of quantum randomness. So, 1 bit of randomness can reduce risk by at least 1 / 10 billion.
Thus, 75 bits of randomness presumably reduces risk by > 1 / 10 billion which is >> 2^-75.
(This argument is a bit messy because presumably some logical facts imply that Nate will be very helpful and some imply that he won’t be very helpful and I was taking an expectation over this while we really care about the effect on all the quantum branches. I’m not sure exactly how to make the argument exactly right, but at least I think it is roughly right.)
What about these case where we only go back 10 years? We can apply the same argument, but instead just use some number of bits (e.g. 10) to make Nate work a bit more, say 1 week of additional work via changing whether Nate ends up getting sick (by adjusting the weather or which children are born, or whatever). This should also reduce doom by 1 week / (52 weeks/year) / (20 years/duration of work) * 1 / 10 billion = 1 / 10 trillion.
And surely there are more efficient schemes.
To be clear, only having ~ 1 / 10 billion branches survive is rough from a trade perspective.
What are you trying to argue? (I don’t currently know what position y’all think I have or what position you’re arguing for. Taking a shot in the dark: I agree that quantum bitflips have loads more influence on the outcome the earlier in time they are.)
I argue that right now, sarting from the present state, the true quantum probability of achieving the Glorious Future is way higher than 2^-75, or if not, then we should probably work on something other than AI safety. Me and Ryan argue for this in the last few comments. It’s not a terribly important point, you can just say the true quantum probability is 1 in a billion, when it’s still worth it for you to work on the problem, but it becomes rough to trade for keeping humanity physically alive that can cause one year of delay to the AI.
But I would like you to acknowledge that “vastly below 2^-75 true quantum probability, as starting from now” is probably mistaken, or explain why our logic is wrong about how this implies you should work on malaria.
Starting from now? I agree that that’s true in some worlds that I consider plausible, at least, and I agree that worlds whose survival-probabilities are sensitive to my choices are the ones that render my choices meaningful (regardless of how determinisic they are).
Conditional on Earth being utterly doomed, are we (today) fewer than 75 qbitflips from being in a good state? I’m not sure, it probably varies across the doomed worlds where I have decent amounts of subjective probability. It depends how much time we have on the clock, depends where the points of no-return are. I haven’t thought about this a ton. My best guess is it would take more than 75 qbitflips to save us now, but maybe I’m not thinking creatively enough about how to spend them, and I haven’t thought about it in detail and expect I’d be sensitive to argument about it /shrug.
(If you start from 50 years ago? Very likely! 75 bits is a lot of population rerolls. If you start after people hear the thunder of the self-replicating factories barrelling towards them, and wait until the very last moments that they would consider becoming a distinct person who is about to die from AI, and who wishes to draw upon your reassurance that they will be saved? Very likely not! Those people look very, very dead.)
One possible point of miscommunication is that when I said something like “obviously it’s worse than 2^-75 at the extreme where it’s actually them who is supposed to survive” was intended to apply to the sort of person who has seen the skies darken and has heard the thunder, rather than the version of them that exists here in 2024. This was not intended to be some bold or suprising claim. It was an attempt to establish an obvious basepoint at one very extreme end of a spectrum, that we could start interpolating from (asking questions like “how far back from there are the points of no return?” and “how much more entropy would they have than god, if people from that branchpoint spent stars trying to figure out what happened after those points?”).
(The 2^-75 was not intended to be even an esitmate of how dead the people on the one end of the extreme are. It is the “can you buy a star” threshold. I was trying to say something like “the individuals who actually die obviously can’t buy themselves a star just because they inhabit Tegmark III, now let’s drag the cursor backwards and talk about whether, at any point, we cross the a-star-for-everyone threshold”.)
If that doesn’t clear things up and you really want to argue that, conditional on Earth being as doomed as it superficially looks to me, most of those worlds are obviously <100 quantum bitflips from victory today, I’m willing to field those arguments; maybe you see some clever use of qbitflips I don’t and that would be kinda cool. But I caveat that this doesn’t seem like a crux to me and that I acknowledge that the other worlds (where Earth merely looks unsavlageable) are the ones motivating action.
I have not followed this thread in all of its detail, but it sounds like it might be getting caught up on the difference between the underlying ratio of different quantum worlds (which can be expressed as a probability over one’s future) and one’s probabilistic uncertainty over the underlying ratio of different quantum worlds (which can also be expressed as a probability over the future but does not seem to me to have the same implications for behavior).
Insofar as it seems to readers like a bad idea to optimize for different outcomes in a deterministic universe, I recommend reading the Free Will (Solution) sequence by Eliezer Yudkowsky, which I found fairly convincing on the matter of why it’s still right to optimize in a fully deterministic universe, as well as in a universe running on quantum mechanics (interpreted to have many worlds).
My first claim is not “fewer than 1 in 2^75 of the possible configurations of human populations navigate the problem successfully”.
My first claim is more like “given a population of humans that doesn’t even come close to navigating the problem successfully (given some unoptimized configuration of the background particles), probably you’d need to spend quite a lot of bits of optimization to tune the butterfly-effects in the background particles to make that same population instead solve alignment (depending how far back in time you go).” (A very rough rule of thumb here might be “it should take about as many bits as it takes to specify an FAI (relative to what they know)”.)
This is especially stark if you’re trying to find a branch of reality that survives with the “same people” on it. Humans seem to be very, very sensitive about what counts as the “same people”. (e.g., in August, when gambling on who gets a treat, I observed a friend toss a quantum coin, see it come up against them, and mourn that a different person—not them—would get to eat the treat.)
(Insofar as y’all are trying to argue “those MIRI folk say that AI will kill you, but actually, a person somewhere else in the great quantum multiverse, who has the same genes and childhood as you but whose path split off many years ago, will wake up in a simulation chamber and be told that they were rescued by the charity of aliens! So it’s not like you’ll really die”, then I at least concede that that’s an easier case to make, although it doesn’t feel like a very honest presentation to me.)
Conditional on observing a given population of humans coming nowhere close to solving the problem, the branches wherein those humans live (with identity measured according to the humans) are probably very extremely narrow compared to the versions where they die. My top guess would be that 2^-75 number is a vast overestimate of how thick those branches are (and the 75 in the exponent does not come from any attempt of mine to make that estimate).
As I said earlier: you can take branches that branched off earlier and earlier in time, and they’ll get better and better odds. (Probably pretty drastically, as you back off past certain points of no return. I dunno where the points of no return are. Weeks? Months? Years? Not decades, because with decades you can reroll significant portions of the population.)
I haven’t thought much about what fraction of populations I’d expect to survive off of what branch-point. (How many bits of optimization do you need back in the 1880s to swap Hitler out for some charismatic science-enthusiast statesman that will happen to have exactly the right infulence on the following culture? How many such routes are there? I have no idea.)
Three big (related) issues with hoping that forks branced off sufficiently early (who are more numerous) save us in particular (rather than other branches) are (a) they plausibly care more about populations nearer to them (e.g. versions of themselves that almost died); (b) insofar as they care about more distant populations (that e.g. include you), they have rather a lot of distant populations to attempt to save; and (c) they have trouble distinguishing populations that never were, from populations that were and then weren’t.
Point (c) might be a key part of the story, not previously articulated (that I recall), that you were missing?
Like, you might say “well, if one in a billion branches look like dath ilan and the rest look like earth, and the former basically all survive and the latter basically all die, then the fact that the earthlike branches have ~0 ability to save their earthlike kin doesn’t matter, so long as the dath-ilan like branches are trying to save everyone. dath ilan can just flip 30 quantum coins to select a single civilization from among the billion that died, and then spend 1/million resources on simulating that civilization (or paying off their murderer or whatever), and that still leaves us with one-in-a-quintillion fraction of the universe, which is enough to keep the lights running”.
Part of the issue with this is that dath ilan cannot simply sample from the space of dead civilizations; it has to sample from a space of plausible dead civilizations rather than actual dead civilizations, in a way that I expect to smear loads and loads of probability-mass over regions that had concentrated (but complex) patterns of amplitude. The concentrations of Everett branches are like a bunch of wiggly thin curves etched all over a disk, and it’s not too hard to sample uniformly from the disk (and draw a plausible curve that the point could have been on), but it’s much harder to sample only from the curves. (Or, at least, so the physics looks to me. And this seems like a common phenomenon in physics. c.f. the apparent inevitable increase of entropy when what’s actually happening is a previously-compact volume in phase space evolving int oa bunch of wiggly thin curves, etc.)
So when you’re considering whether surviving humans will pay for our souls—not somebody’s souls, but our souls in particular—you have a question of how these alleged survivors came to pay for us in particular (rather than some other poor fools). And there’s a tradeoff that runs on one exrteme from “they’re saving us because they are almost exactly us and they remember us and wish us to have a nice epilog” all the way to “they’re some sort of distant cousins, branched off a really long time ago, who are trying to save everyone”.
The problem with being on the “they care about us because they consider they basically are us” end is that those people are dead to (conditional on us being dead). And as you push the branch-point earlier and earlier in time, you start finding more survivors, but those survivors also wind up having more and more fools to care about (in part because they have trouble distinguishing the real fallen civilizations from the neighboring civilization-configurations that don’t get appreciable quantum amplitude in basement physics).
If you tell me where on this tradeoff curve you want to be, we can talk about it. (Ryan seemed to want to look all the way on the “insurance pool with aliens” end of the spectrum.)
The point of the 2^75 number is that that’s about the threshold of “can you purchase a single star”. My guess is that, conditional on people dying, versions that they consider also them survive with degree way less than 2^-75, which rules out us being the ones who save us.
If we retreat to “distant cousin branches of humanity might save us”, there’s a separate question of how the width of the surviving quantum branch compares to the volume taken up by us in the space of civilizations they attempt to save. I think my top guess is that a distant branch of humanity, spending stellar-level resources in attempts to concentrate its probability-mass in accordance with how quantum physics concentrates (squared) amplitude, still winds up so uncertain that there’s still 50+ bits of freedom left over? Which means that if one-in-a-billion of our cousin-branches survives, they still can’t buy a star (unless I flubbed my math).
And I think it’s real, real easy for them to wind up with 1000 bits leftover, in which case their purchasing power is practically nothing.
(This actually seems like a super reasonable guess to me. Like, if you imagine knowing that a mole of gas was compressed into the corner of a box with known volume, and you then let the gas bounce around for 13 billion years and take some measurements of pressure and temperature, and then think long and hard using an amount of compute that’s appreciably less than the amount you’d need to just simulate the whole thing from the start. It seems to me like you wind up with a distribution that has way way more than 1000 bits more entropy than is contained in the underlying physics. Imagining that you can spend about 1 ten millionth of the universe on refining a distribution over Tegmark III with entropy that’s within 50 bits of god seems very very generous to me; I’m very uncertain about this stuff but I think that even mature superintelligences could easily wind up 1000 bits from god here.)
Regardless, as I mentioned elsewhere, I think that a more relevant question is how those trade-offers stack up to other trade-offers, so /shrug.
I understand what you are saying here, and I understood it before the comment thread started. The thing I would be interested in you responding to is my and Ryan’s comments in this thread arguing that it’s incompatible to believe that “My guess is that, conditional on people dying, versions that they consider also them survive with degree way less than 2^-75, which rules out us being the ones who save us” and to believe that you should work on AI safety instead of malaria.
Even if you think a life’s work can’t make a difference but many can, you can still think it’s worthwhile to work on alignment for whatever reasons make you think it’s worthwhile to do things like voting.
(E.g. a non-CDT decision theory)
Not quite following—your possibilities.
1. Alignment is almost impossible, then there is say 1e-20 chance we survive. Yes surviving worlds have luck and good alignment work etc. Perhaps you should work on alignment or still bednets if the odds really are that low.
2. Alignment is easy by default, but there is nothing like 0.999999 we survive, say 95% because AGI that is not TAI superintelligence could cause us to wipe ourselves out first, among other things. (This is a slow takeoff universe(s))
#2 has much more branches in total where we survive (not sure if that matters) and the difference between where things go well and badly is almost all about stopping ourself killing ourselves with non TAI related things. In this situation, shouldn’t you be working on those things?
If you average 1,2 then you still get a lot of work on non-alignment related stuff.
I believe its somewhere closer to 50⁄50 and not so overdetermined one way or the other, but we are not considering that here.
Sure, like how when a child sees a fireman pull a woman out of a burning building and says “if I were that big and strong, I would also pull people out of burning buildings”, in a sense it’s partially the child’s decsiion that does the work of saving the woman. (There’s maybe a little overlap in how they run the same decision procedure that’s coming to the same conclusion in both cases, but vanishingly little of the credit goes to the child.)
In the case where the AI is optimizing reality-and-instantiation-weighted experience, you’re giving it a threat, and your plan fails on the grounds that sane reasoners ignore that sort of threat.
in the case where your plan is “I am hoping that the AI will be insane in some other unspecified but precise way which will make it act as I wish”, I don’t see how it’s any more helpful than the plan “I am hoping the AI will be aligned”—it seems to me that we have just about as much ability to hit either target.
The child is partly responsible—to a very small but nonzero degree—for the fireman’s actions, because the child’s personal decision procedure has some similarity to the fireman’s decision procedure?
Is this a correct reading of what you said?
I was responding to David saying
and was insinuating that we deserve extremely little credit for such a choice, in the same way that a child deserves extremely little credit for a fireman saving someone that the child could not (even if it’s true that the child and the fireman share some aspects of a decision procedure). My claim was intended less like agreement with David’s claim and more like reductio ad absurdum, with the degree of absurdity left slightly ambiguous.
(And on second thought, the analogy would perhaps have been tighter if the firefighter was saving the child.)
I think the common sense view is that this similarity of decision procedures provides exactly zero reason to credit the child with the fireman’s decisions. Credit for a decision goes to the agent who makes it, or perhaps to the algorithm that the agent used, but not to other agents running the same or similar algorithms.