David Matolcsi comments on You can, in fact, bamboozle an unaligned AI into sparing your life

David Matolcsi 30 Sep 2024 7:36 UTC
21 points
24
I think if we do a poll, it will become clear that the strong majority of readers interpreted Nate’s post as “If you don’t solve aligment, you shouldn’t expect that some LDT/simulation mumbo-jumbo will let you and your loved ones survive this” and not in the more reasonable way you are interpreting this. I certainly interpreted the post that way.
Separately, as I state in the post, I believe that once you make the argument that “I am not planning to spend my universe-fractions of the few universes in which we do manage to build aligned AGI this way, but you are free to do so, and I agree that this might imply that AI will also spare us in this world, though I think doing this would probably be a mistake by all of our values”, you forever lose the right to appeal to people’s emotions about how sad you are that all our children are going to die.
If you personally don’t make the emotional argument about the children, I have no quarrel with you, I respect utilitarians. But I’m very annoyed at anyone who emotionnally appeals to saving the children, then casually admits that they wouldn’t spend one in a hundred million fraction of their resources to save them.
- habryka 30 Sep 2024 16:59 UTC
  15 points
  2
  Parent
  I think there is a much simpler argument that would arrive at the same conclusion, but also, I think that much simpler argument kind of shows why I feel frustrated with this critique:
  Humanity will not go extinct, because we are in a simulation. This is because we really don’t like dying, and so I am making sure that after we build aligned AI, I spend a lot of resources making simulations of early-earth to make sure you all have the experience of being alive. This means it’s totally invalid to claim that “AI will kill you all”. It is the case that AI will kill you in a very small fraction of worlds, which are the small fraction of observer moments of yours located in actual base reality, but because we will spend like ¹⁄₁₀₀ millionth of our resources simulating early earths surviving, you can basically be guaranteed to survive as well.
  And like… OK, yeah, you can spend your multiverse-fractions this way. Indeed, you could actually win absolutely any argument ever this way:
  I am really frustrated with people saying that takeoff will be fast. Indeed, if we solve AI Alignment I will spend my fraction of the multiverse running early-earth simulations where takeoff was slow, and so no matter what happened in the base-universe, y’alls observer-moments will observe slow takeoff. This means there is very little chance that we will all experience a fast takeoff, because I and others will have made so many early earth simulations that you are virtually guaranteed to experience takeoff as slow.
  I agree that “not dying in a base universe” is a more reasonable thing to care about than “proving people right that takeoff is slow” but I feel like both lines of argument that you bring up here are doing something where you take a perspective on the world that is very computationalist, unituitive and therefore takes you to extremely weird places, makes strong assumptions about what a post-singularity humanity will care about, and then uses that to try to defeat an argument in a weird and twisted way that maybe is technically correct, but I think unless you are really careful with every step, really does not actually communicate what is going on.
  It is obviously extremely fucking bad for AI to disempower humanity. I think “literally everyone you know dies” is a much more accurate capture of that, and also a much more valid conclusion from conservative premises than “via multiverse simulation shenanigans maybe you specifically won’t die, but like, you have to understand that we had to give up something equally costly, so it’s as bad as you dying, but I don’t want you to think of it as dying”, which I am confident is not a reasonable thing to communicate to people who haven’t thought through all of this very carefully.
  Like, yeah, multiverse simulation shenanigans make it hard for any specific statement about what AI will do to humanity to be true. In some sense they are an argument against any specific human-scale bad thing to happen, because if we do win, we could spend a substantial fraction of our resources with future AI systems to prevent that. But I think making that argument before getting people to understand that being in the position to have to do that is an enormous gigantic atrocity, is really dumb. Especially if people frame it as “the AI will leave you alone”.
  No, the AI will not leave you alone if we lose. The whole universe will be split at its seams and everything you know destroyed and remade and transformed into the most efficient version of itself for whatever goal the AI is pursuing, which yeah, might include trading with some other humanity’s in other parts of the multiverse where we won, but you will still be split apart and transformed and completely disempowered (and we have no idea what that will actually look like, and we both know that “dying” is not really a meaningful abstraction in worlds where you can remake brains from scratch).
  - ryan_greenblatt 1 Oct 2024 0:26 UTC
    2 points
    0
    Parent
    I agree that “not dying in a base universe” is a more reasonable thing to care about than “proving people right that takeoff is slow” but I feel like both lines of argument that you bring up here are doing something where you take a perspective on the world that is very computationalist, unituitive and therefore takes you to extremely weird places, makes strong assumptions about what a post-singularity humanity will care about, and then uses that to try to defeat an argument in a weird and twisted way that maybe is technically correct, but I think unless you are really careful with every step, really does not actually communicate what is going on.
    
    I agree that common sense morality and common sense views are quite confused about the relevant situation. Indexical selfish perspectives are also pretty confused and are perhaps even more incoherant.
    
    However, I think that under the most straightforward generalization of common sense views or selfishness where you just care about the base universe and there is just one base universe, this scheme can work to save lives in the base universe^[1].
    
    I legitimately think that common sense moral views should care less about AI takeover due to these arguments. As in, there is a reasonable chance that a bunch of people aren’t killed due to these arguments (and other different arguments) in the most straightforward sense.
    
    I also think “the AI might leave you alone, but we don’t really know and there seems at least a high chance that huge numbers of people, including you, die” is not a bad summary of the situation.
    
    In some sense they are an argument against any specific human-scale bad thing to happen, because if we do win, we could spend a substantial fraction of our resources with future AI systems to prevent that.
    
    Yes. I think any human-scale bad thing (except stuff needed for the AI to most easily take over and solidify control) can be paid for and this has some chance of working. (Tiny amounts of kindness works in a similar way.)
    
    Humanity will not go extinct, because we are in a simulation.
    
    FWIW, I think it is non-obvious how common sense views interpret these considerations. I think it is probably common to just care about base reality? (Which is basically equivalent to having a measure etc.) I do think that common sense moral views don’t consider it good to run these simulations for this purpose while bailing out aliens who would have bailed us out is totally normal/reasonable under common sense moral views.
    
    It is obviously extremely fucking bad for AI to disempower humanity. I think “literally everyone you know dies” is a much more accurate capture of that, and also a much more valid conclusion from conservative premises
    
    Why not just say what’s more straightforwardly true:
    
    “I believe that AI takeover has a high probability of killing billions and should be strongly avoided, and would be a serious and irreversible decision by our society that’s likely to be a mistake even if it doesn’t lead to billions of deaths.”
    
    I don’t think “literally everyone you know dies if AI takes over” is accurate because I don’t expect that in the base reality version of this universe for multiple reasons. Like it might happen, but I don’t know if it is more than 50% likely.
    
    ↩︎
    It’s not crazy to call the resulting scheme “multiverse/simulation shenanigans” TBC (as it involves prediction/simulation and uncertainty over the base universe), but I think this is just because I expect that multiverse/simulation shenanigans will alter the way AIs in base reality act in the common sense straightforward way.
    - habryka 1 Oct 2024 1:10 UTC
      4 points
      4
      Parent
      “I believe that AI takeover has a high probability of killing billions and should be strongly avoided, and would be a serious and irreversible decision by our society that’s likely to be a mistake even if it doesn’t lead to billions of deaths.”
      I mean, this feels like it is of completely the wrong magnitude. “Killing billions” is just vastly vastly vastly less bad than “completely eradicating humanity’s future”, which is actually what is going on.
      Like, my attitude towards AI and x-risk would be hugely different if the right abstraction would be “a few billion people die”. Like, OK, that’s like a few decades of population growth. Basically nothing in the big picture. And I think this is also true by the vast majority of common-sense ethical views. People care about the future of humanity. “Saving the world” is hugely more important than preventing the marginal atrocity. Outside of EA I have never actually met a welfarist who only cares about present humans. People of course think we are supposed to be good stewards of humanity’s future, especially if you select on the people who are actually involved in global scale decisions.
      Normal people who are not bought into super crazy computationalist stuff understand that humanity’s extinction is much worse than just a few billion people dying, and the thing that is happening is much more like extinction than it is like a few billion people dying.
      - ryan_greenblatt 1 Oct 2024 1:24 UTC
        4 points
        2
        Parent
        (I mostly care about long term future and scope sensitive resource use like habryka TBC.)
        
        Sure, we can amend to:
        
        “I believe that AI takeover would eliminate humanity’s control over its future, has a high probability of killing billions, and should be strongly avoided.”
        
        We could also say something like “AI takeover seems similar to takeover by hostile aliens with potentially unrecognizable values. It would eliminate humanity’s control over its future and has a high probability of killing billions.”
      - ryan_greenblatt 1 Oct 2024 1:29 UTC
        2 points
        0
        Parent
        
        And I think this is also true by the vast majority of common-sense ethical views. People care about the future of humanity. “Saving the world” is hugely more important than preventing the marginal atrocity. Outside of EA I have never actually met a welfarist who only cares about present humans. People of course think we are supposed to be good stewards of humanity’s future, especially if you select on the people who are actually involved in global scale decisions.
        
        Hmmm, I agree with this as stated, but it’s not clear to me that this is scope sensitive. As in, suppose that the AI will eventually leave humans in control of earth and the solar system. Do people typically this is an extremely bad? I don’t think so, though I’m not sure.
        
        And, I think trading for humans to eventually control the solar system is pretty doable. (Most of the trade cost is in preventing an earlier slaughter and violence which was useful for takeover or avoiding delay.)
        ryan_greenblatt 1 Oct 2024 1:31 UTC
        4 points
        0
        Parent
        At a more basic level, I think the situation is just actually much more confusing than human extinction in a bunch of ways.
        
        (Separately, under my views misaligned AI takeover seems worse than human extinction due to (e.g.) biorisk. This is because primates or other closely related seem very likely to re-evolve into an intelligent civilization and I feel better about this civilization than AIs.)
- CarlShulman 2 Oct 2024 19:24 UTC
  7 points
  0
  Parent
  I think if we do a poll, it will become clear that the strong majority of readers interpreted Nate’s post as “If you don’t solve aligment, you shouldn’t expect that some LDT/simulation mumbo-jumbo will let you and your loved ones survive this” and not in the more reasonable way you are interpreting this. I certainly interpreted the post that way.
  You can run the argument past a poll of LLM models of humans and show their interpretations.
  
  I strongly agree with your second paragraph.