David Matolcsi comments on You can, in fact, bamboozle an unaligned AI into sparing your life

David Matolcsi 30 Sep 2024 22:41 UTC
5 points
0
As I said, I understand the difference between epictemic uncertainty and true quantum probabilities, though I do think that the true quantum probability is not that astronomically low.
More importantly, I still feel confused why you are working on AI safety if the outcome is that overdetermined one way or the other.
- So8res 30 Sep 2024 22:44 UTC
  14 points
  5
  Parent
  What does degree of determination have to do with it? If you lived in a fully deterministic universe, and you were uncertain whether it was going to live or die, would you give up on it on the mere grounds that the answer is deterministic (despite your own uncertainty about which answer is physically determined)?
  - David Matolcsi 1 Oct 2024 8:17 UTC
    4 points
    0
    Parent
    I still think I’m right about this. Your conception (that not a genetically less smart sibling was born), is determined by quantum fluctuations. So if you believe that quantum fluctuations over the last 50 years make at most 2^-75 difference in the probability of alignment, that’s an upper bound on how much a difference your life’s work can make. While if you dedicate your life to buying bednets, it’s pretty easily calculatable how many happy life-years do you save. So I still think it’s incompatible to believe that the true quantum probability is astronomically low, but you can make enough difference that working on AI safety is clearly better than bednets.
    - So8res 1 Oct 2024 14:22 UTC
      4 points
      0
      Parent
      the “you can’t save us by flipping 75 bits” thing seems much more likely to me on a timescale of years than a timescale of decades; I’m fairly confident that quantum fluctuations can cause different people to be born, and so if you’re looking 50 years back you can reroll the population dice.
      - David Matolcsi 1 Oct 2024 16:13 UTC
        13 points
        0
        Parent
        This point feels like a technicality, but I want to debate it because I think a fair number of your other claims depend on it.
        You often claim that conditional on us failing in alignment, alignment was so unlikely that among branches that had roughyly the same people (genetically) during the Singularity, only 2^-75 survives. This is important, because then we can’t rely on other versions of ourselves “selfishly” entering an insurance contract with us, and we need to rely on the charity of Dath Ilan that branched off long ago. I agree that’s a big difference. Also, I say that our decision to pay is correlated with our luckier brethren paying, so in a sense partially our decision is the thing that saves us. You dismiss that saying it’s like a small child claiming credit for the big, strong fireman saving people. If it’s Dath Ilan that saves us, I agree with you, but if it’s genetical copies of some currently existing people, I think your metaphor pretty clearly doesn’t apply, and the decisions to pay are in fact decently strongly correlated.
        Now I don’t see how much difference decades vs years makes in this framework. If you believe that now our true quantum probabilty is 2^-75, but 40 years ago it was still a not-astronomical number (like 1 in a million), then should I just plea to people who are older than 40 to promise to themselves they will pay in the future? I don’t really see what difference this makes.
        But also, I think the years vs decades dichtihomy is pretty clearly false. Suppoose you believe your expected value of one year of work decreases x-risk by X. What’s the yearly true quantum probability that someone who is in your reference class of importance in your opinion, dies or gets a debilitating interest, or gets into a carreer-destroying scandal, etc? I think it’s hard to argue it’s less than 0.1% a year. (But it makes no big difference if you add one or two zeros). These things are also continuous, even if none of the important people die, someone will lose a month or some weeks to an illness, etc. I think this is a pretty strong case that the one year from now, the 90th percentile luckiest Everett-branch contains 0.01 year of the equivalent of Nate-work than the 50th percentile Everett-branch.
        But your claims imply that you believe the true probability of success differs by less than 2^-72 between the 50th and 90th percentile luckiness branches a year from now. That puts an upper bound on the value of a year of your labor at 2^-62 probability decrease in x-risk.
        With these exact numbers, this can be still worth doing given the astronomical stakes, but if your made-up number was 2^-100 instead, I think it would be better for you to work on malaria.
        ryan_greenblatt 1 Oct 2024 19:00 UTC
        4 points
        2
        Parent
        Here is another more narrow way to put this argument:
        
        Let’s say Nate is 35 (arbitrary guess).
        Let’s say that branches which deviated 35 years ago would pay for our branch (and other branches in our reference class). The case for this is that many people are over 50 (thus existing in both branches), and care about deviated versions of themselves and their children etc. Probably the discount relative to zero deviation is less than 10x.
        Let’s say that Nate thinks that if he didn’t ever exist, P(takeover) would go up by 1 / 10 billion (roughly 2^-32). If it was wildly lower than this, that would be somewhat surprising and might suggest different actions.
        Nate existing is sensitive to a bit of quantum randomness 35 years ago, so other people as good as Nate existing could be created with a bit of quantum randomness. So, 1 bit of randomness can reduce risk by at least 1 / 10 billion.
        Thus, 75 bits of randomness presumably reduces risk by > 1 / 10 billion which is >> 2^-75.
        
        (This argument is a bit messy because presumably some logical facts imply that Nate will be very helpful and some imply that he won’t be very helpful and I was taking an expectation over this while we really care about the effect on all the quantum branches. I’m not sure exactly how to make the argument exactly right, but at least I think it is roughly right.)
        
        What about these case where we only go back 10 years? We can apply the same argument, but instead just use some number of bits (e.g. 10) to make Nate work a bit more, say 1 week of additional work via changing whether Nate ends up getting sick (by adjusting the weather or which children are born, or whatever). This should also reduce doom by 1 week / (52 weeks/year) / (20 years/duration of work) * 1 / 10 billion = 1 / 10 trillion.
        
        And surely there are more efficient schemes.
        
        To be clear, only having ~ 1 / 10 billion branches survive is rough from a trade perspective.
        So8res 1 Oct 2024 20:45 UTC
        2 points
        0
        Parent
        What are you trying to argue? (I don’t currently know what position y’all think I have or what position you’re arguing for. Taking a shot in the dark: I agree that quantum bitflips have loads more influence on the outcome the earlier in time they are.)
        David Matolcsi 1 Oct 2024 21:18 UTC
        3 points
        0
        Parent
        I argue that right now, sarting from the present state, the true quantum probability of achieving the Glorious Future is way higher than 2^-75, or if not, then we should probably work on something other than AI safety. Me and Ryan argue for this in the last few comments. It’s not a terribly important point, you can just say the true quantum probability is 1 in a billion, when it’s still worth it for you to work on the problem, but it becomes rough to trade for keeping humanity physically alive that can cause one year of delay to the AI.
        But I would like you to acknowledge that “vastly below 2^-75 true quantum probability, as starting from now” is probably mistaken, or explain why our logic is wrong about how this implies you should work on malaria.
        So8res 1 Oct 2024 21:41 UTC
        5 points
        2
        Parent
        Starting from now? I agree that that’s true in some worlds that I consider plausible, at least, and I agree that worlds whose survival-probabilities are sensitive to my choices are the ones that render my choices meaningful (regardless of how determinisic they are).
        
        Conditional on Earth being utterly doomed, are we (today) fewer than 75 qbitflips from being in a good state? I’m not sure, it probably varies across the doomed worlds where I have decent amounts of subjective probability. It depends how much time we have on the clock, depends where the points of no-return are. I haven’t thought about this a ton. My best guess is it would take more than 75 qbitflips to save us now, but maybe I’m not thinking creatively enough about how to spend them, and I haven’t thought about it in detail and expect I’d be sensitive to argument about it /shrug.
        
        (If you start from 50 years ago? Very likely! 75 bits is a lot of population rerolls. If you start after people hear the thunder of the self-replicating factories barrelling towards them, and wait until the very last moments that they would consider becoming a distinct person who is about to die from AI, and who wishes to draw upon your reassurance that they will be saved? Very likely not! Those people look very, very dead.)
        
        One possible point of miscommunication is that when I said something like “obviously it’s worse than 2^-75 at the extreme where it’s actually them who is supposed to survive” was intended to apply to the sort of person who has seen the skies darken and has heard the thunder, rather than the version of them that exists here in 2024. This was not intended to be some bold or suprising claim. It was an attempt to establish an obvious basepoint at one very extreme end of a spectrum, that we could start interpolating from (asking questions like “how far back from there are the points of no return?” and “how much more entropy would they have than god, if people from that branchpoint spent stars trying to figure out what happened after those points?”).
        
        (The 2^-75 was not intended to be even an esitmate of how dead the people on the one end of the extreme are. It is the “can you buy a star” threshold. I was trying to say something like “the individuals who actually die obviously can’t buy themselves a star just because they inhabit Tegmark III, now let’s drag the cursor backwards and talk about whether, at any point, we cross the a-star-for-everyone threshold”.)
        
        If that doesn’t clear things up and you really want to argue that, conditional on Earth being as doomed as it superficially looks to me, most of those worlds are obviously <100 quantum bitflips from victory today, I’m willing to field those arguments; maybe you see some clever use of qbitflips I don’t and that would be kinda cool. But I caveat that this doesn’t seem like a crux to me and that I acknowledge that the other worlds (where Earth merely looks unsavlageable) are the ones motivating action.
        Ben Pace 1 Oct 2024 21:34 UTC
        3 points
        0
        Parent
        I have not followed this thread in all of its detail, but it sounds like it might be getting caught up on the difference between the underlying ratio of different quantum worlds (which can be expressed as a probability over one’s future) and one’s probabilistic uncertainty over the underlying ratio of different quantum worlds (which can also be expressed as a probability over the future but does not seem to me to have the same implications for behavior).
        Insofar as it seems to readers like a bad idea to optimize for different outcomes in a deterministic universe, I recommend reading the Free Will (Solution) sequence by Eliezer Yudkowsky, which I found fairly convincing on the matter of why it’s still right to optimize in a fully deterministic universe, as well as in a universe running on quantum mechanics (interpreted to have many worlds).
        So8res 1 Oct 2024 20:23 UTC
        3 points
        −3
        Parent
        
        You often claim that conditional on us failing in alignment, alignment was so unlikely that among branches that had roughyly the same people (genetically) during the Singularity, only 2^-75 survives.
        
        My first claim is not “fewer than 1 in 2^75 of the possible configurations of human populations navigate the problem successfully”.
        
        My first claim is more like “given a population of humans that doesn’t even come close to navigating the problem successfully (given some unoptimized configuration of the background particles), probably you’d need to spend quite a lot of bits of optimization to tune the butterfly-effects in the background particles to make that same population instead solve alignment (depending how far back in time you go).” (A very rough rule of thumb here might be “it should take about as many bits as it takes to specify an FAI (relative to what they know)”.)
        
        This is especially stark if you’re trying to find a branch of reality that survives with the “same people” on it. Humans seem to be very, very sensitive about what counts as the “same people”. (e.g., in August, when gambling on who gets a treat, I observed a friend toss a quantum coin, see it come up against them, and mourn that a different person—not them—would get to eat the treat.)
        
        (Insofar as y’all are trying to argue “those MIRI folk say that AI will kill you, but actually, a person somewhere else in the great quantum multiverse, who has the same genes and childhood as you but whose path split off many years ago, will wake up in a simulation chamber and be told that they were rescued by the charity of aliens! So it’s not like you’ll really die”, then I at least concede that that’s an easier case to make, although it doesn’t feel like a very honest presentation to me.)
        
        Conditional on observing a given population of humans coming nowhere close to solving the problem, the branches wherein those humans live (with identity measured according to the humans) are probably very extremely narrow compared to the versions where they die. My top guess would be that 2^-75 number is a vast overestimate of how thick those branches are (and the 75 in the exponent does not come from any attempt of mine to make that estimate).
        
        As I said earlier: you can take branches that branched off earlier and earlier in time, and they’ll get better and better odds. (Probably pretty drastically, as you back off past certain points of no return. I dunno where the points of no return are. Weeks? Months? Years? Not decades, because with decades you can reroll significant portions of the population.)
        
        I haven’t thought much about what fraction of populations I’d expect to survive off of what branch-point. (How many bits of optimization do you need back in the 1880s to swap Hitler out for some charismatic science-enthusiast statesman that will happen to have exactly the right infulence on the following culture? How many such routes are there? I have no idea.)
        
        Three big (related) issues with hoping that forks branced off sufficiently early (who are more numerous) save us in particular (rather than other branches) are (a) they plausibly care more about populations nearer to them (e.g. versions of themselves that almost died); (b) insofar as they care about more distant populations (that e.g. include you), they have rather a lot of distant populations to attempt to save; and (c) they have trouble distinguishing populations that never were, from populations that were and then weren’t.
        
        Point (c) might be a key part of the story, not previously articulated (that I recall), that you were missing?
        
        Like, you might say “well, if one in a billion branches look like dath ilan and the rest look like earth, and the former basically all survive and the latter basically all die, then the fact that the earthlike branches have ~0 ability to save their earthlike kin doesn’t matter, so long as the dath-ilan like branches are trying to save everyone. dath ilan can just flip 30 quantum coins to select a single civilization from among the billion that died, and then spend 1/million resources on simulating that civilization (or paying off their murderer or whatever), and that still leaves us with one-in-a-quintillion fraction of the universe, which is enough to keep the lights running”.
        
        Part of the issue with this is that dath ilan cannot simply sample from the space of dead civilizations; it has to sample from a space of plausible dead civilizations rather than actual dead civilizations, in a way that I expect to smear loads and loads of probability-mass over regions that had concentrated (but complex) patterns of amplitude. The concentrations of Everett branches are like a bunch of wiggly thin curves etched all over a disk, and it’s not too hard to sample uniformly from the disk (and draw a plausible curve that the point could have been on), but it’s much harder to sample only from the curves. (Or, at least, so the physics looks to me. And this seems like a common phenomenon in physics. c.f. the apparent inevitable increase of entropy when what’s actually happening is a previously-compact volume in phase space evolving int oa bunch of wiggly thin curves, etc.)
        
        So when you’re considering whether surviving humans will pay for our souls—not somebody’s souls, but our souls in particular—you have a question of how these alleged survivors came to pay for us in particular (rather than some other poor fools). And there’s a tradeoff that runs on one exrteme from “they’re saving us because they are almost exactly us and they remember us and wish us to have a nice epilog” all the way to “they’re some sort of distant cousins, branched off a really long time ago, who are trying to save everyone”.
        
        The problem with being on the “they care about us because they consider they basically are us” end is that those people are dead to (conditional on us being dead). And as you push the branch-point earlier and earlier in time, you start finding more survivors, but those survivors also wind up having more and more fools to care about (in part because they have trouble distinguishing the real fallen civilizations from the neighboring civilization-configurations that don’t get appreciable quantum amplitude in basement physics).
        
        If you tell me where on this tradeoff curve you want to be, we can talk about it. (Ryan seemed to want to look all the way on the “insurance pool with aliens” end of the spectrum.)
        
        The point of the 2^75 number is that that’s about the threshold of “can you purchase a single star”. My guess is that, conditional on people dying, versions that they consider also them survive with degree way less than 2^-75, which rules out us being the ones who save us.
        
        If we retreat to “distant cousin branches of humanity might save us”, there’s a separate question of how the width of the surviving quantum branch compares to the volume taken up by us in the space of civilizations they attempt to save. I think my top guess is that a distant branch of humanity, spending stellar-level resources in attempts to concentrate its probability-mass in accordance with how quantum physics concentrates (squared) amplitude, still winds up so uncertain that there’s still 50+ bits of freedom left over? Which means that if one-in-a-billion of our cousin-branches survives, they still can’t buy a star (unless I flubbed my math).
        
        And I think it’s real, real easy for them to wind up with 1000 bits leftover, in which case their purchasing power is practically nothing.
        
        (This actually seems like a super reasonable guess to me. Like, if you imagine knowing that a mole of gas was compressed into the corner of a box with known volume, and you then let the gas bounce around for 13 billion years and take some measurements of pressure and temperature, and then think long and hard using an amount of compute that’s appreciably less than the amount you’d need to just simulate the whole thing from the start. It seems to me like you wind up with a distribution that has way way more than 1000 bits more entropy than is contained in the underlying physics. Imagining that you can spend about 1 ten millionth of the universe on refining a distribution over Tegmark III with entropy that’s within 50 bits of god seems very very generous to me; I’m very uncertain about this stuff but I think that even mature superintelligences could easily wind up 1000 bits from god here.)
        
        Regardless, as I mentioned elsewhere, I think that a more relevant question is how those trade-offers stack up to other trade-offers, so /shrug.
        David Matolcsi 1 Oct 2024 20:55 UTC
        3 points
        0
        Parent
        I understand what you are saying here, and I understood it before the comment thread started. The thing I would be interested in you responding to is my and Ryan’s comments in this thread arguing that it’s incompatible to believe that “My guess is that, conditional on people dying, versions that they consider also them survive with degree way less than 2^-75, which rules out us being the ones who save us” and to believe that you should work on AI safety instead of malaria.
    - mattmacdermott 1 Oct 2024 11:11 UTC
      1 point
      −3
      Parent
      Even if you think a life’s work can’t make a difference but many can, you can still think it’s worthwhile to work on alignment for whatever reasons make you think it’s worthwhile to do things like voting.
      
      (E.g. a non-CDT decision theory)