So8res comments on You can, in fact, bamboozle an unaligned AI into sparing your life

So8res 30 Sep 2024 18:01 UTC
14 points
6
There’s a question of how thick the Everett branches are, where someone is willing to pay for us. Towards one extreme, you have the literal people who literally died, before they have branched much; these branches need to happen close to the last minute. Towards the other extreme, you have all evolved life, some fraction of which you might imagine might care to pay for any other evolved species.

The problem with expecting folks at the first extreme to pay for you is that they’re almost all dead (like $1 - 2^{- a lot}$ dead). The problem with expecting folks at the second extreme to pay for you is that they’ve got rather a lot of fools to pay for (like $2^{a lot}$ of fools). As you interpolate between the extremes, you interpolate between the problems.

The “75” number in particular is the threshold where you can’t spend your entire universe in exchange for a star.

We are currently uncertain about whether Earth is doomed. As a simple example, perhaps you’re ⁵⁰⁄₅₀ on whether humanity is up to the task of solving the alignment problem, because you can’t yet distinguish between the hypothesis “the underlying facts of computer science are such that civilization can just bumble its way into AI alignment” and “the underlying facts of computer science are such that civilization is nowhere near up to this task”. In that case, the question is, conditional on the last hypothesis being true, how far back in the timeline do you have to go before you can flip only 75 quantum bits and have a civilization that is up to the task?

And how many fools does that surviving branch have to save?
- avturchin 1 Oct 2024 12:02 UTC
  2 points
  0
  Parent
  I think that there is a way to compensate for this effect.
  To illustrate compensation, consider the following experiment: Imagine that I want to resurrect a particular human by creating a quantum random file. This seems absurd as there is only $2^{- a lot}$ chance that I create the right person. However, there are around d $2^{a lot}$ copies of me in different branches who perform similar experiments, so in total, any resurrection attempt will create around 1 correct copy, but in a different branch. If we agree to trade resurrections between branches, every possible person will be resurrected in some branch.
  Here, it means that we can ignore worries that we create a model of the wrong AI or that AI creates a wrong model of us, because a wrong model of us will be a real model of someone else, and someone else’s wrong model will be a correct model of us.
  Thus, we can ignore all branching counting at first approximation, and instead count only the probability that Aligned AI will be created. It is reasonable to estimate it as 10 percent, plus or minus an order of magnitude.
  In that case, we need to trade with non-aligned AI by giving 10 planets of paperclips for each planet with humans.
- ryan_greenblatt 30 Sep 2024 18:19 UTC
  2 points
  0
  Parent
  
  Towards one extreme, you have the literal people who literally died, before they have branched much; these branches need to happen close to the last minute.
  
  By “last minute”, you mean “after I existed” right? So, e.g., if I care about genetic copies, that would be after I am born and if I care about contingent life experiences, that could be after I turned 16 or something. This seems to leave many years, maybe over a decade for most people.
  
  I think David was confused by the “last minute language” which is really many years right? (I think you meant “last minute on evolutionary time scales, but not literally in the last few minutes”.)
  
  That said, I’m generally super unconfident about how much a quantum bit changes things.
  - So8res 30 Sep 2024 18:50 UTC
    4 points
    2
    Parent
    “last minute” was intended to reference whatever timescale David would think was the relevant point of branch-off. (I don’t know where he’d think it goes; there’s a tradeoff where the later you push it the more that the people on the surviving branch care about you rather than about some other doomed population, and the earlier you push it the more that the people on the surviving branch have loads and loads of doomed populations to care after.)
    
    I chose the phrase “last minute” because it is an idiom that is ambiguous over timescales (unlike, say, “last three years”) and because it’s the longer of the two that sprung to mind (compared to “last second”), with perhaps some additional influence from the fact that David had spent a bunch of time arguing about how we would be saved (rather than arguing that someone in the multiverse might pay for some branches of human civilization to be saved, probably not us), which seemed to me to imply that he was imagining a branchpoint very close to the end (given how rapidly people dissasociate from alternate versions of them on other Everett branches).
    - David Matolcsi 30 Sep 2024 21:00 UTC
      1 point
      −4
      Parent
      Yeah, the misunderstanding came from that I thought that “last minute” literally means “last 60 seconds” and I didn’t see how that’s relevant. If if means “last 5 years” or something where it’s still definitely our genetic copies running around, then I’m surprised you think alignment success or failure is that overdetermined at that time-scale. I understand your point that our epistemic uncertainty is not the same as our actual quantum probability, that is either very high or very low. But still, it’s 2^75 overdetermined over a 5 year period? This sounds very surprising to me, the world feels more chaotic than that. (Taiwan gets nuked, chip development halts, meanwhile the Salvadorian president hears a good pitch about designer babies and legalizes running the experiments there and they work, etc, there are many things that contribute to alignment being solved or not, that don’t directly run through underlying facts about computer science, and 2^-75 is a very low probability to none of the pathways to hit it).
      
      But also, I think I’m confused why you work on AI safety then, if you believe the end-state is already 2^75 level overdetermined. Like maybe working on earning to give to bednets would be a better use of your time then. And if you say “yes, my causal impact is very low because the end result is already overdetermined, but my actions are logically correlated with the actions of people in other worlds who are in a similar epistemic situation to me, but whose actions actually matter because their world really is on the edge”, then I don’t understand why you argue in other comments that we can’t enter into insurance contracts with those people, and our decision to pay AIs in the Future has as little correlation with their decision, as the child to the fireman.
      - So8res 30 Sep 2024 21:13 UTC
        6 points
        2
        Parent
        
        I think I’m confused why you work on AI safety then, if you believe the end-state is already 2^75 level overdetermined.
        
        It’s probably physically overdetermined one way or another, but we’re not sure which way yet. We’re still unsure about things like “how sensitive is the population to argument” and “how sensibly do government respond if the population shifts”.
        
        But this uncertainty—about which way things are overdetermined by the laws of physics—does not bear all that much relationship to the expected ratio of (squared) quantum amplitude between branches where we live and branches where we die. It just wouldn’t be that shocking for the ratio between those two sorts of branches to be on the order of 2^75; this would correspond to saying something like “it turns out we weren’t just a few epileptic seizures and a well-placed thunderstorm away from the other outcome”.
        David Matolcsi 30 Sep 2024 22:41 UTC
        5 points
        0
        Parent
        As I said, I understand the difference between epictemic uncertainty and true quantum probabilities, though I do think that the true quantum probability is not that astronomically low.
        More importantly, I still feel confused why you are working on AI safety if the outcome is that overdetermined one way or the other.
        So8res 30 Sep 2024 22:44 UTC
        14 points
        5
        Parent
        What does degree of determination have to do with it? If you lived in a fully deterministic universe, and you were uncertain whether it was going to live or die, would you give up on it on the mere grounds that the answer is deterministic (despite your own uncertainty about which answer is physically determined)?
        David Matolcsi 1 Oct 2024 8:17 UTC
        4 points
        0
        Parent
        I still think I’m right about this. Your conception (that not a genetically less smart sibling was born), is determined by quantum fluctuations. So if you believe that quantum fluctuations over the last 50 years make at most 2^-75 difference in the probability of alignment, that’s an upper bound on how much a difference your life’s work can make. While if you dedicate your life to buying bednets, it’s pretty easily calculatable how many happy life-years do you save. So I still think it’s incompatible to believe that the true quantum probability is astronomically low, but you can make enough difference that working on AI safety is clearly better than bednets.
        So8res 1 Oct 2024 14:22 UTC
        4 points
        0
        Parent
        the “you can’t save us by flipping 75 bits” thing seems much more likely to me on a timescale of years than a timescale of decades; I’m fairly confident that quantum fluctuations can cause different people to be born, and so if you’re looking 50 years back you can reroll the population dice.
        David Matolcsi 1 Oct 2024 16:13 UTC
        13 points
        0
        Parent
        This point feels like a technicality, but I want to debate it because I think a fair number of your other claims depend on it.
        You often claim that conditional on us failing in alignment, alignment was so unlikely that among branches that had roughyly the same people (genetically) during the Singularity, only 2^-75 survives. This is important, because then we can’t rely on other versions of ourselves “selfishly” entering an insurance contract with us, and we need to rely on the charity of Dath Ilan that branched off long ago. I agree that’s a big difference. Also, I say that our decision to pay is correlated with our luckier brethren paying, so in a sense partially our decision is the thing that saves us. You dismiss that saying it’s like a small child claiming credit for the big, strong fireman saving people. If it’s Dath Ilan that saves us, I agree with you, but if it’s genetical copies of some currently existing people, I think your metaphor pretty clearly doesn’t apply, and the decisions to pay are in fact decently strongly correlated.
        Now I don’t see how much difference decades vs years makes in this framework. If you believe that now our true quantum probabilty is 2^-75, but 40 years ago it was still a not-astronomical number (like 1 in a million), then should I just plea to people who are older than 40 to promise to themselves they will pay in the future? I don’t really see what difference this makes.
        But also, I think the years vs decades dichtihomy is pretty clearly false. Suppoose you believe your expected value of one year of work decreases x-risk by X. What’s the yearly true quantum probability that someone who is in your reference class of importance in your opinion, dies or gets a debilitating interest, or gets into a carreer-destroying scandal, etc? I think it’s hard to argue it’s less than 0.1% a year. (But it makes no big difference if you add one or two zeros). These things are also continuous, even if none of the important people die, someone will lose a month or some weeks to an illness, etc. I think this is a pretty strong case that the one year from now, the 90th percentile luckiest Everett-branch contains 0.01 year of the equivalent of Nate-work than the 50th percentile Everett-branch.
        But your claims imply that you believe the true probability of success differs by less than 2^-72 between the 50th and 90th percentile luckiness branches a year from now. That puts an upper bound on the value of a year of your labor at 2^-62 probability decrease in x-risk.
        With these exact numbers, this can be still worth doing given the astronomical stakes, but if your made-up number was 2^-100 instead, I think it would be better for you to work on malaria.
        ryan_greenblatt 1 Oct 2024 19:00 UTC
        4 points
        2
        Parent
        Here is another more narrow way to put this argument:
        
        Let’s say Nate is 35 (arbitrary guess).
        Let’s say that branches which deviated 35 years ago would pay for our branch (and other branches in our reference class). The case for this is that many people are over 50 (thus existing in both branches), and care about deviated versions of themselves and their children etc. Probably the discount relative to zero deviation is less than 10x.
        Let’s say that Nate thinks that if he didn’t ever exist, P(takeover) would go up by 1 / 10 billion (roughly 2^-32). If it was wildly lower than this, that would be somewhat surprising and might suggest different actions.
        Nate existing is sensitive to a bit of quantum randomness 35 years ago, so other people as good as Nate existing could be created with a bit of quantum randomness. So, 1 bit of randomness can reduce risk by at least 1 / 10 billion.
        Thus, 75 bits of randomness presumably reduces risk by > 1 / 10 billion which is >> 2^-75.
        
        (This argument is a bit messy because presumably some logical facts imply that Nate will be very helpful and some imply that he won’t be very helpful and I was taking an expectation over this while we really care about the effect on all the quantum branches. I’m not sure exactly how to make the argument exactly right, but at least I think it is roughly right.)
        
        What about these case where we only go back 10 years? We can apply the same argument, but instead just use some number of bits (e.g. 10) to make Nate work a bit more, say 1 week of additional work via changing whether Nate ends up getting sick (by adjusting the weather or which children are born, or whatever). This should also reduce doom by 1 week / (52 weeks/year) / (20 years/duration of work) * 1 / 10 billion = 1 / 10 trillion.
        
        And surely there are more efficient schemes.
        
        To be clear, only having ~ 1 / 10 billion branches survive is rough from a trade perspective.
        Expand this thread
        So8res 1 Oct 2024 20:45 UTC
        2 points
        0
        Parent
        What are you trying to argue? (I don’t currently know what position y’all think I have or what position you’re arguing for. Taking a shot in the dark: I agree that quantum bitflips have loads more influence on the outcome the earlier in time they are.)
        David Matolcsi 1 Oct 2024 21:18 UTC
        3 points
        0
        Parent
        I argue that right now, sarting from the present state, the true quantum probability of achieving the Glorious Future is way higher than 2^-75, or if not, then we should probably work on something other than AI safety. Me and Ryan argue for this in the last few comments. It’s not a terribly important point, you can just say the true quantum probability is 1 in a billion, when it’s still worth it for you to work on the problem, but it becomes rough to trade for keeping humanity physically alive that can cause one year of delay to the AI.
        But I would like you to acknowledge that “vastly below 2^-75 true quantum probability, as starting from now” is probably mistaken, or explain why our logic is wrong about how this implies you should work on malaria.
        So8res 1 Oct 2024 21:41 UTC
        5 points
        2
        Parent
        Starting from now? I agree that that’s true in some worlds that I consider plausible, at least, and I agree that worlds whose survival-probabilities are sensitive to my choices are the ones that render my choices meaningful (regardless of how determinisic they are).
        
        Conditional on Earth being utterly doomed, are we (today) fewer than 75 qbitflips from being in a good state? I’m not sure, it probably varies across the doomed worlds where I have decent amounts of subjective probability. It depends how much time we have on the clock, depends where the points of no-return are. I haven’t thought about this a ton. My best guess is it would take more than 75 qbitflips to save us now, but maybe I’m not thinking creatively enough about how to spend them, and I haven’t thought about it in detail and expect I’d be sensitive to argument about it /shrug.
        
        (If you start from 50 years ago? Very likely! 75 bits is a lot of population rerolls. If you start after people hear the thunder of the self-replicating factories barrelling towards them, and wait until the very last moments that they would consider becoming a distinct person who is about to die from AI, and who wishes to draw upon your reassurance that they will be saved? Very likely not! Those people look very, very dead.)
        
        One possible point of miscommunication is that when I said something like “obviously it’s worse than 2^-75 at the extreme where it’s actually them who is supposed to survive” was intended to apply to the sort of person who has seen the skies darken and has heard the thunder, rather than the version of them that exists here in 2024. This was not intended to be some bold or suprising claim. It was an attempt to establish an obvious basepoint at one very extreme end of a spectrum, that we could start interpolating from (asking questions like “how far back from there are the points of no return?” and “how much more entropy would they have than god, if people from that branchpoint spent stars trying to figure out what happened after those points?”).
        
        (The 2^-75 was not intended to be even an esitmate of how dead the people on the one end of the extreme are. It is the “can you buy a star” threshold. I was trying to say something like “the individuals who actually die obviously can’t buy themselves a star just because they inhabit Tegmark III, now let’s drag the cursor backwards and talk about whether, at any point, we cross the a-star-for-everyone threshold”.)
        
        If that doesn’t clear things up and you really want to argue that, conditional on Earth being as doomed as it superficially looks to me, most of those worlds are obviously <100 quantum bitflips from victory today, I’m willing to field those arguments; maybe you see some clever use of qbitflips I don’t and that would be kinda cool. But I caveat that this doesn’t seem like a crux to me and that I acknowledge that the other worlds (where Earth merely looks unsavlageable) are the ones motivating action.
        Ben Pace 1 Oct 2024 21:34 UTC
        3 points
        0
        Parent
        I have not followed this thread in all of its detail, but it sounds like it might be getting caught up on the difference between the underlying ratio of different quantum worlds (which can be expressed as a probability over one’s future) and one’s probabilistic uncertainty over the underlying ratio of different quantum worlds (which can also be expressed as a probability over the future but does not seem to me to have the same implications for behavior).
        Insofar as it seems to readers like a bad idea to optimize for different outcomes in a deterministic universe, I recommend reading the Free Will (Solution) sequence by Eliezer Yudkowsky, which I found fairly convincing on the matter of why it’s still right to optimize in a fully deterministic universe, as well as in a universe running on quantum mechanics (interpreted to have many worlds).
        So8res 1 Oct 2024 20:23 UTC
        3 points
        −3
        Parent
        
        You often claim that conditional on us failing in alignment, alignment was so unlikely that among branches that had roughyly the same people (genetically) during the Singularity, only 2^-75 survives.
        
        My first claim is not “fewer than 1 in 2^75 of the possible configurations of human populations navigate the problem successfully”.
        
        My first claim is more like “given a population of humans that doesn’t even come close to navigating the problem successfully (given some unoptimized configuration of the background particles), probably you’d need to spend quite a lot of bits of optimization to tune the butterfly-effects in the background particles to make that same population instead solve alignment (depending how far back in time you go).” (A very rough rule of thumb here might be “it should take about as many bits as it takes to specify an FAI (relative to what they know)”.)
        
        This is especially stark if you’re trying to find a branch of reality that survives with the “same people” on it. Humans seem to be very, very sensitive about what counts as the “same people”. (e.g., in August, when gambling on who gets a treat, I observed a friend toss a quantum coin, see it come up against them, and mourn that a different person—not them—would get to eat the treat.)
        
        (Insofar as y’all are trying to argue “those MIRI folk say that AI will kill you, but actually, a person somewhere else in the great quantum multiverse, who has the same genes and childhood as you but whose path split off many years ago, will wake up in a simulation chamber and be told that they were rescued by the charity of aliens! So it’s not like you’ll really die”, then I at least concede that that’s an easier case to make, although it doesn’t feel like a very honest presentation to me.)
        
        Conditional on observing a given population of humans coming nowhere close to solving the problem, the branches wherein those humans live (with identity measured according to the humans) are probably very extremely narrow compared to the versions where they die. My top guess would be that 2^-75 number is a vast overestimate of how thick those branches are (and the 75 in the exponent does not come from any attempt of mine to make that estimate).
        
        As I said earlier: you can take branches that branched off earlier and earlier in time, and they’ll get better and better odds. (Probably pretty drastically, as you back off past certain points of no return. I dunno where the points of no return are. Weeks? Months? Years? Not decades, because with decades you can reroll significant portions of the population.)
        
        I haven’t thought much about what fraction of populations I’d expect to survive off of what branch-point. (How many bits of optimization do you need back in the 1880s to swap Hitler out for some charismatic science-enthusiast statesman that will happen to have exactly the right infulence on the following culture? How many such routes are there? I have no idea.)
        
        Three big (related) issues with hoping that forks branced off sufficiently early (who are more numerous) save us in particular (rather than other branches) are (a) they plausibly care more about populations nearer to them (e.g. versions of themselves that almost died); (b) insofar as they care about more distant populations (that e.g. include you), they have rather a lot of distant populations to attempt to save; and (c) they have trouble distinguishing populations that never were, from populations that were and then weren’t.
        
        Point (c) might be a key part of the story, not previously articulated (that I recall), that you were missing?
        
        Like, you might say “well, if one in a billion branches look like dath ilan and the rest look like earth, and the former basically all survive and the latter basically all die, then the fact that the earthlike branches have ~0 ability to save their earthlike kin doesn’t matter, so long as the dath-ilan like branches are trying to save everyone. dath ilan can just flip 30 quantum coins to select a single civilization from among the billion that died, and then spend 1/million resources on simulating that civilization (or paying off their murderer or whatever), and that still leaves us with one-in-a-quintillion fraction of the universe, which is enough to keep the lights running”.
        
        Part of the issue with this is that dath ilan cannot simply sample from the space of dead civilizations; it has to sample from a space of plausible dead civilizations rather than actual dead civilizations, in a way that I expect to smear loads and loads of probability-mass over regions that had concentrated (but complex) patterns of amplitude. The concentrations of Everett branches are like a bunch of wiggly thin curves etched all over a disk, and it’s not too hard to sample uniformly from the disk (and draw a plausible curve that the point could have been on), but it’s much harder to sample only from the curves. (Or, at least, so the physics looks to me. And this seems like a common phenomenon in physics. c.f. the apparent inevitable increase of entropy when what’s actually happening is a previously-compact volume in phase space evolving int oa bunch of wiggly thin curves, etc.)
        
        So when you’re considering whether surviving humans will pay for our souls—not somebody’s souls, but our souls in particular—you have a question of how these alleged survivors came to pay for us in particular (rather than some other poor fools). And there’s a tradeoff that runs on one exrteme from “they’re saving us because they are almost exactly us and they remember us and wish us to have a nice epilog” all the way to “they’re some sort of distant cousins, branched off a really long time ago, who are trying to save everyone”.
        
        The problem with being on the “they care about us because they consider they basically are us” end is that those people are dead to (conditional on us being dead). And as you push the branch-point earlier and earlier in time, you start finding more survivors, but those survivors also wind up having more and more fools to care about (in part because they have trouble distinguishing the real fallen civilizations from the neighboring civilization-configurations that don’t get appreciable quantum amplitude in basement physics).
        
        If you tell me where on this tradeoff curve you want to be, we can talk about it. (Ryan seemed to want to look all the way on the “insurance pool with aliens” end of the spectrum.)
        
        The point of the 2^75 number is that that’s about the threshold of “can you purchase a single star”. My guess is that, conditional on people dying, versions that they consider also them survive with degree way less than 2^-75, which rules out us being the ones who save us.
        
        If we retreat to “distant cousin branches of humanity might save us”, there’s a separate question of how the width of the surviving quantum branch compares to the volume taken up by us in the space of civilizations they attempt to save. I think my top guess is that a distant branch of humanity, spending stellar-level resources in attempts to concentrate its probability-mass in accordance with how quantum physics concentrates (squared) amplitude, still winds up so uncertain that there’s still 50+ bits of freedom left over? Which means that if one-in-a-billion of our cousin-branches survives, they still can’t buy a star (unless I flubbed my math).
        
        And I think it’s real, real easy for them to wind up with 1000 bits leftover, in which case their purchasing power is practically nothing.
        
        (This actually seems like a super reasonable guess to me. Like, if you imagine knowing that a mole of gas was compressed into the corner of a box with known volume, and you then let the gas bounce around for 13 billion years and take some measurements of pressure and temperature, and then think long and hard using an amount of compute that’s appreciably less than the amount you’d need to just simulate the whole thing from the start. It seems to me like you wind up with a distribution that has way way more than 1000 bits more entropy than is contained in the underlying physics. Imagining that you can spend about 1 ten millionth of the universe on refining a distribution over Tegmark III with entropy that’s within 50 bits of god seems very very generous to me; I’m very uncertain about this stuff but I think that even mature superintelligences could easily wind up 1000 bits from god here.)
        
        Regardless, as I mentioned elsewhere, I think that a more relevant question is how those trade-offers stack up to other trade-offers, so /shrug.
        Expand this thread
        David Matolcsi 1 Oct 2024 20:55 UTC
        3 points
        0
        Parent
        I understand what you are saying here, and I understood it before the comment thread started. The thing I would be interested in you responding to is my and Ryan’s comments in this thread arguing that it’s incompatible to believe that “My guess is that, conditional on people dying, versions that they consider also them survive with degree way less than 2^-75, which rules out us being the ones who save us” and to believe that you should work on AI safety instead of malaria.
        mattmacdermott 1 Oct 2024 11:11 UTC
        1 point
        −3
        Parent
        Even if you think a life’s work can’t make a difference but many can, you can still think it’s worthwhile to work on alignment for whatever reasons make you think it’s worthwhile to do things like voting.
        
        (E.g. a non-CDT decision theory)
        RussellThor 1 Oct 2024 9:06 UTC
        1 point
        0
        Parent
        Not quite following—your possibilities.
        1. Alignment is almost impossible, then there is say 1e-20 chance we survive. Yes surviving worlds have luck and good alignment work etc. Perhaps you should work on alignment or still bednets if the odds really are that low.
        2. Alignment is easy by default, but there is nothing like 0.999999 we survive, say 95% because AGI that is not TAI superintelligence could cause us to wipe ourselves out first, among other things. (This is a slow takeoff universe(s))
        #2 has much more branches in total where we survive (not sure if that matters) and the difference between where things go well and badly is almost all about stopping ourself killing ourselves with non TAI related things. In this situation, shouldn’t you be working on those things?
        If you average 1,2 then you still get a lot of work on non-alignment related stuff.
        I believe its somewhere closer to ⁵⁰⁄₅₀ and not so overdetermined one way or the other, but we are not considering that here.