David Matolcsi comments on You can, in fact, bamboozle an unaligned AI into sparing your life

David Matolcsi 30 Sep 2024 0:14 UTC
3 points
0
I largely agree with this comment. I agree figuring these things out is probably not top priority, maximizing our control over the future to get more bargaining chips is good, and if you don’t want to spend your resources on other things than saving Earth, that’s a valid choice, hopefully we will have enough people to pay for it.

I disagree with this not implying that the AI doesn’t kill everyone. I would be surprised if keeping humanity alive cause more than one year of delay, and that’s at most one in a hundred million fraction of the Universe. If a positive Singularity happens in the next few years, and the Universe gets parceled out among currently living humans (I don’t expect exactly this to happen, but might be a decent model), then just 80 of getting together is enough. You need somewhat more than that, because the probability of alignment can be below 50%, and you want to give a margin to the AIs in the deal, but on the other hand we have the proposal of simulating smaller Universes and less coordinated humans, which makes the AI think that the simulators might be richer and have a better chance of solving alignment. Altogether, this doesn’t seem unaffordable to me.
And while I agree that this is probably not the most important topic, I found it frustrating that the most prominent post on the topic is Nate’s post, (it was curated in 2022! you left a very positive comment on it saying that you have linked the post to many people since it came out!) and I think that post is actually very bad, and it’s unhealthy that the most prominent post on the topic was one where the author is dunking on various imaginary opponents in a sneering tone, while conspicuously avoiding to bring up the actually reasonable arguments on the other side.
- habryka 30 Sep 2024 0:40 UTC
  2 points
  2
  Parent
  I agree that in as much as you have an AI that somehow has gotten in a position to guarantee victory, then leaving humanity alive might not be that costly (though still too costly to make it worth it IMO), but a lot of the costs come from leaving humanity alive threatening your victory. I.e. not terraforming earth to colonize the universe is one more year for another hostile AI to be built, or for an asteroid to destroy you, or for something else to disempower you.
  Disagree on the critique of Nate’s posts. The two posts seem relatively orthogonal to me (and I generally think it’s good to have debunkings of bad arguments, even if there are better arguments for a position, and in this particular case due to the multiplier nature of this kind of consideration debunking the bad arguments is indeed qualitatively more important than engaging with the arguments in this post, because the arguments in this post do indeed not end up changing your actions, whereas the arguments Nate argued against were trying to change what people do right now).
  - ryan_greenblatt 30 Sep 2024 1:48 UTC
    28 points
    18
    Parent
    I think we should have a norm that you should explain the limitations of the debunking when debunking bad arguments, particularly if there are stronger arguments that sound similar to the bad argument.
    
    A more basic norm is that you shouldn’t claim or strongly imply that your post is strong evidence against something when it just debunks some bad arguments for it, particularly there are relatively well known better arguments.
    
    I think Nate’s post violates both of these norms. In fact, I think multiple posts about this topic from Nate and Eliezer^[1] violate this norm. (Examples: the corresponding post by Nate, “But why would the AI kill us” by Nate, and “The Sun is big, but superintelligences will not spare Earth a little sunlight” by Eliezer.)
    
    I discuss this more in this comment I made earlier today.
    
    ↩︎
    I’m including Eliezer because he has a similar perspective, obviously they are different people.
  - David Matolcsi 30 Sep 2024 1:21 UTC
    5 points
    2
    Parent
    I state in the post that I agree that the takeover, while the AI stabilizes its position to the degree that it can prevent other AIs from being built, can be very violent, but I don’t see how hunting down everyone living in Argentina is an important step in the takeover.
    I strongly disagree about Nate’s post. I agree that it’s good that he debunked some bad arguments, but it’s just not true that he is only arguing against ideas that were trying to change how people act right now. He spends long sections on the imagined Interlocutor coming up with false hopes that are not action-relevant in the present, like our friends in the multiverse saving us, us running simulations in the future and punishing the AI for defection and us asking for half the Universe now in bargain then using a fraction of what we got to run simulations for bargaining. These take up like half the essay. My proposal clearly fits in the reference class of arguments Nate debunks, he just doesn’t get around to it, and spends pages on strictly worse proposals, like one where we don’t reward the cooperating AIs in the future simulations but punish the defecting ones.
  - ryan_greenblatt 30 Sep 2024 1:33 UTC
    4 points
    2
    Parent
    I agree that Nate’s post makes good arguments against AIs spending a high fraction of resources on being nice or on stuff we like (and that this is an important question). And it also debunks some bad arguments against small fractions. But the post really seems to be trying to argue against small fractions in general:
    
    [Some people think maybe AI] would leave humanity a few stars/galaxies/whatever on game-theoretic grounds. [...] I’m pretty confident that this view is wrong (alas), and based on a misunderstanding of LDT. I shall now attempt to clear up that confusion.
    
    As far as:
    
    debunking the bad arguments is indeed qualitatively more important than engaging with the arguments in this post, because the arguments in this post do indeed not end up changing your actions, whereas the arguments Nate argued against were trying to change what people do right now
    
    I interpreted the main effect (on people) of Nate’s post as arguing for “the AI will kill everyone despite decision theory, so you shouldn’t feel good about the AI situation” rather than arguing against decision theory schemes for humans getting a bunch of the lightcone. (I don’t think there are many people who care about AI safety but are working on implementing crazy decision theory schemes to control the AI?)
    
    If so, then I think we’re mostly just arguing about P(misaligned AI doesn’t kill us due to decision theory like stuff | misaligned AI takeover). If you agree with this, then I dislike the quoted argument. This would be similar to saying “debunking bad arguments against x-risk is more important than debunking good arguments against x-risk because bad arguments are more likely to change people’s actions while the good arguments are more marginal”.
    
    Maybe I’m misunderstanding you.
    - habryka 30 Sep 2024 3:07 UTC
      13 points
      3
      Parent
      Yeah, I feel confused that you are misunderstanding me this much, given that I feel like we talked about this a few times.
      Nate is saying that in as much as you are pessimistic about alignment, game theoretic arguments should not make you any more optimistic. It will not cause the AI to care more about you. There are no game theoretic arguments that will cause the AI to give humanity any fraction of the multiverse. We can trade with ourselves across the multiverse, probably with some tolls/taxes from AIs that will be in control of other parts of it, and can ultimately decide which fractions of it to control, but the game-theoretic arguments do not cause us to get any larger fraction of the multiverse. They provide no reason for an AI leaving humanity a few stars/galaxies/whatever. The arguments for why we are going to get good outcomes from AI have to come from somewhere else (like that we will successfully align the AI via some mechanism), they cannot come from game theory, because those arguments only work as force-multipliers, not as outcome changers.
      Of course, in as much as you do think that we will solve alignment, then yeah, you might also be able to drag some doomed universes out with you (though it’s unclear whether that would be what you want to do in those worlds, as discussed in other comments here).
      I really feel like my point here is not very difficult. The acausal trade arguments do not help you with AI Alignment. Honestly, at the point where you can make convincing simulations that fool nascent superintelligences, it also feels so weird to spend your time on saving doomed earths via acausal trade. Just simulate the earths directly if you really care about having more observer-moments in which earth survives. And like, I don’t want to stop you from spending your universe-fraction this way in worlds where we survive, and so yeah, maybe this universe does end up surviving for that reason, but I feel like that’s more because you are making a kind of bad decision, not because the game-theoretic arguments here were particularly important.
      I agree that it would have been better for Nate’s post to have a section that had this argument explicitly. Something like:
      Interlocutor: But what if I want to offer the AI paperclips in worlds where I built an aligned AI to give us a few planets in universes that are doomed?
      Me: Well, first you will still have the problem of actually figuring out what the values of AIs are that overthrew humanity in other parts of the multiverse. My guess is you will have a quite hard time figuring those out, but it might be possible, I don’t have very strong takes here.
      So sure, let’s say you somehow figure out what an unaligned AI that will kill humanity in a different everett branch cares about, and you successfully aligned your own AI (a big if, and one that I think is very unlikely to occur). Then yeah, you can maybe offer a galaxy to the squiggle maximizer to get a planet or solar system for yourself in a doomed world. Maybe you can even get a better exchange rate, though I don’t think it will be easy.
      To be clear, you now have many fewer planets than you had before. It is true that the planets you do have are distributed across different branches of the multiverse, which is a thing that I think you are free to care about, but you might notice that in this situation the AI did not give you more stuff. Indeed, you now overall have less stuff, and honestly, my best guess is that if you want to create more humanity-survives-the-20th-century-observer-moments you should just run early-earth simulations yourself. I am not planning to spend my universe-fractions of the few universes in which we do manage to build aligned AGI this way, but you are free to do so, and I agree that this might imply that AI will also spare us in this world, though I think doing this would probably be a mistake by all of our values.
      [Probably another section about how instead of just giving each other planets, you combine your utility functions with some weighing so you get full returns from gains from trade, which makes the exchange rate a bunch cheaper, but doesn’t change the core argument]
      I agree that this section would have clarified the scope of the core argument of the post and would have made it better, but I don’t think the core argument of the post is invalid, and I don’t think the post ignores any important counterarguments against the thing it is actually arguing for (as opposed to ignoring counterarguments to a different thing that sounds kind of similar, and I agree some people are likely to confuse, but is really quite qualitatively different).
      - David Matolcsi 30 Sep 2024 7:36 UTC
        21 points
        24
        Parent
        I think if we do a poll, it will become clear that the strong majority of readers interpreted Nate’s post as “If you don’t solve aligment, you shouldn’t expect that some LDT/simulation mumbo-jumbo will let you and your loved ones survive this” and not in the more reasonable way you are interpreting this. I certainly interpreted the post that way.
        Separately, as I state in the post, I believe that once you make the argument that “I am not planning to spend my universe-fractions of the few universes in which we do manage to build aligned AGI this way, but you are free to do so, and I agree that this might imply that AI will also spare us in this world, though I think doing this would probably be a mistake by all of our values”, you forever lose the right to appeal to people’s emotions about how sad you are that all our children are going to die.
        If you personally don’t make the emotional argument about the children, I have no quarrel with you, I respect utilitarians. But I’m very annoyed at anyone who emotionnally appeals to saving the children, then casually admits that they wouldn’t spend one in a hundred million fraction of their resources to save them.
        habryka 30 Sep 2024 16:59 UTC
        15 points
        −1
        Parent
        I think there is a much simpler argument that would arrive at the same conclusion, but also, I think that much simpler argument kind of shows why I feel frustrated with this critique:
        Humanity will not go extinct, because we are in a simulation. This is because we really don’t like dying, and so I am making sure that after we build aligned AI, I spend a lot of resources making simulations of early-earth to make sure you all have the experience of being alive. This means it’s totally invalid to claim that “AI will kill you all”. It is the case that AI will kill you in a very small fraction of worlds, which are the small fraction of observer moments of yours located in actual base reality, but because we will spend like ¹⁄₁₀₀ millionth of our resources simulating early earths surviving, you can basically be guaranteed to survive as well.
        And like… OK, yeah, you can spend your multiverse-fractions this way. Indeed, you could actually win absolutely any argument ever this way:
        I am really frustrated with people saying that takeoff will be fast. Indeed, if we solve AI Alignment I will spend my fraction of the multiverse running early-earth simulations where takeoff was slow, and so no matter what happened in the base-universe, y’alls observer-moments will observe slow takeoff. This means there is very little chance that we will all experience a fast takeoff, because I and others will have made so many early earth simulations that you are virtually guaranteed to experience takeoff as slow.
        I agree that “not dying in a base universe” is a more reasonable thing to care about than “proving people right that takeoff is slow” but I feel like both lines of argument that you bring up here are doing something where you take a perspective on the world that is very computationalist, unituitive and therefore takes you to extremely weird places, makes strong assumptions about what a post-singularity humanity will care about, and then uses that to try to defeat an argument in a weird and twisted way that maybe is technically correct, but I think unless you are really careful with every step, really does not actually communicate what is going on.
        It is obviously extremely fucking bad for AI to disempower humanity. I think “literally everyone you know dies” is a much more accurate capture of that, and also a much more valid conclusion from conservative premises than “via multiverse simulation shenanigans maybe you specifically won’t die, but like, you have to understand that we had to give up something equally costly, so it’s as bad as you dying, but I don’t want you to think of it as dying”, which I am confident is not a reasonable thing to communicate to people who haven’t thought through all of this very carefully.
        Like, yeah, multiverse simulation shenanigans make it hard for any specific statement about what AI will do to humanity to be true. In some sense they are an argument against any specific human-scale bad thing to happen, because if we do win, we could spend a substantial fraction of our resources with future AI systems to prevent that. But I think making that argument before getting people to understand that being in the position to have to do that is an enormous gigantic atrocity, is really dumb. Especially if people frame it as “the AI will leave you alone”.
        No, the AI will not leave you alone if we lose. The whole universe will be split at its seams and everything you know destroyed and remade and transformed into the most efficient version of itself for whatever goal the AI is pursuing, which yeah, might include trading with some other humanity’s in other parts of the multiverse where we won, but you will still be split apart and transformed and completely disempowered (and we have no idea what that will actually look like, and we both know that “dying” is not really a meaningful abstraction in worlds where you can remake brains from scratch).
        ryan_greenblatt 1 Oct 2024 0:26 UTC
        4 points
        0
        Parent
        
        I agree that “not dying in a base universe” is a more reasonable thing to care about than “proving people right that takeoff is slow” but I feel like both lines of argument that you bring up here are doing something where you take a perspective on the world that is very computationalist, unituitive and therefore takes you to extremely weird places, makes strong assumptions about what a post-singularity humanity will care about, and then uses that to try to defeat an argument in a weird and twisted way that maybe is technically correct, but I think unless you are really careful with every step, really does not actually communicate what is going on.
        
        I agree that common sense morality and common sense views are quite confused about the relevant situation. Indexical selfish perspectives are also pretty confused and are perhaps even more incoherant.
        
        However, I think that under the most straightforward generalization of common sense views or selfishness where you just care about the base universe and there is just one base universe, this scheme can work to save lives in the base universe^[1].
        
        I legitimately think that common sense moral views should care less about AI takeover due to these arguments. As in, there is a reasonable chance that a bunch of people aren’t killed due to these arguments (and other different arguments) in the most straightforward sense.
        
        I also think “the AI might leave you alone, but we don’t really know and there seems at least a high chance that huge numbers of people, including you, die” is not a bad summary of the situation.
        
        In some sense they are an argument against any specific human-scale bad thing to happen, because if we do win, we could spend a substantial fraction of our resources with future AI systems to prevent that.
        
        Yes. I think any human-scale bad thing (except stuff needed for the AI to most easily take over and solidify control) can be paid for and this has some chance of working. (Tiny amounts of kindness works in a similar way.)
        
        Humanity will not go extinct, because we are in a simulation.
        
        FWIW, I think it is non-obvious how common sense views interpret these considerations. I think it is probably common to just care about base reality? (Which is basically equivalent to having a measure etc.) I do think that common sense moral views don’t consider it good to run these simulations for this purpose while bailing out aliens who would have bailed us out is totally normal/reasonable under common sense moral views.
        
        It is obviously extremely fucking bad for AI to disempower humanity. I think “literally everyone you know dies” is a much more accurate capture of that, and also a much more valid conclusion from conservative premises
        
        Why not just say what’s more straightforwardly true:
        
        “I believe that AI takeover has a high probability of killing billions and should be strongly avoided, and would be a serious and irreversible decision by our society that’s likely to be a mistake even if it doesn’t lead to billions of deaths.”
        
        I don’t think “literally everyone you know dies if AI takes over” is accurate because I don’t expect that in the base reality version of this universe for multiple reasons. Like it might happen, but I don’t know if it is more than 50% likely.
        
        ↩︎
        It’s not crazy to call the resulting scheme “multiverse/simulation shenanigans” TBC (as it involves prediction/simulation and uncertainty over the base universe), but I think this is just because I expect that multiverse/simulation shenanigans will alter the way AIs in base reality act in the common sense straightforward way.
        
        habryka 1 Oct 2024 1:10 UTC
        4 points
        2
        Parent
        “I believe that AI takeover has a high probability of killing billions and should be strongly avoided, and would be a serious and irreversible decision by our society that’s likely to be a mistake even if it doesn’t lead to billions of deaths.”
        I mean, this feels like it is of completely the wrong magnitude. “Killing billions” is just vastly vastly vastly less bad than “completely eradicating humanity’s future”, which is actually what is going on.
        Like, my attitude towards AI and x-risk would be hugely different if the right abstraction would be “a few billion people die”. Like, OK, that’s like a few decades of population growth. Basically nothing in the big picture. And I think this is also true by the vast majority of common-sense ethical views. People care about the future of humanity. “Saving the world” is hugely more important than preventing the marginal atrocity. Outside of EA I have never actually met a welfarist who only cares about present humans. People of course think we are supposed to be good stewards of humanity’s future, especially if you select on the people who are actually involved in global scale decisions.
        Normal people who are not bought into super crazy computationalist stuff understand that humanity’s extinction is much worse than just a few billion people dying, and the thing that is happening is much more like extinction than it is like a few billion people dying.
        ryan_greenblatt 1 Oct 2024 1:24 UTC
        4 points
        2
        Parent
        (I mostly care about long term future and scope sensitive resource use like habryka TBC.)
        
        Sure, we can amend to:
        
        “I believe that AI takeover would eliminate humanity’s control over its future, has a high probability of killing billions, and should be strongly avoided.”
        
        We could also say something like “AI takeover seems similar to takeover by hostile aliens with potentially unrecognizable values. It would eliminate humanity’s control over its future and has a high probability of killing billions.”
        ryan_greenblatt 1 Oct 2024 1:29 UTC
        2 points
        0
        Parent
        
        And I think this is also true by the vast majority of common-sense ethical views. People care about the future of humanity. “Saving the world” is hugely more important than preventing the marginal atrocity. Outside of EA I have never actually met a welfarist who only cares about present humans. People of course think we are supposed to be good stewards of humanity’s future, especially if you select on the people who are actually involved in global scale decisions.
        
        Hmmm, I agree with this as stated, but it’s not clear to me that this is scope sensitive. As in, suppose that the AI will eventually leave humans in control of earth and the solar system. Do people typically this is an extremely bad? I don’t think so, though I’m not sure.
        
        And, I think trading for humans to eventually control the solar system is pretty doable. (Most of the trade cost is in preventing an earlier slaughter and violence which was useful for takeover or avoiding delay.)
        ryan_greenblatt 1 Oct 2024 1:31 UTC
        4 points
        0
        Parent
        At a more basic level, I think the situation is just actually much more confusing than human extinction in a bunch of ways.
        
        (Separately, under my views misaligned AI takeover seems worse than human extinction due to (e.g.) biorisk. This is because primates or other closely related seem very likely to re-evolve into an intelligent civilization and I feel better about this civilization than AIs.)
        CarlShulman 2 Oct 2024 19:24 UTC
        7 points
        0
        Parent
        I think if we do a poll, it will become clear that the strong majority of readers interpreted Nate’s post as “If you don’t solve aligment, you shouldn’t expect that some LDT/simulation mumbo-jumbo will let you and your loved ones survive this” and not in the more reasonable way you are interpreting this. I certainly interpreted the post that way.
        You can run the argument past a poll of LLM models of humans and show their interpretations.
        
        I strongly agree with your second paragraph.
- ryan_greenblatt 30 Sep 2024 0:18 UTC
  2 points
  0
  Parent
  
  we have the proposal of simulating smaller Universes and less coordinated humans, which makes the AI think that the simulators might be richer and have a better chance of solving alignment
  
  This only matters if the AIs are CDT or dumb about decision theory etc.
  - David Matolcsi 30 Sep 2024 21:16 UTC
    1 point
    0
    Parent
    I usually defer to you in things like this, but I don’t see why this would be the case. I think the proposal of simulating less competent civilizations is equivalent to the idea of us deciding now, when we don’t really know yet how competent a civilization we are, to bail out less competent alien civilizations in the multiverse if we succeed. In return, we hope that this decision is logically correlated with more competent civilization (who were also unsure in their infancy about how competent they are), deciding to bail out less competent civilizations, including us. My understanding from your comments is that you believe this likely works, how is my proposal of simulating less-coordinated civilizations different?
    
    The story about simulating smaller Universes is more confusing. That would be equivalent to bailing out aliens in smaller Universes for a tiny fraction of our Universe, in the hope that larger Universes also bail us out for a tiny fraction of their Universe. This is very confusing if there are infinite levels of bigger and bigger Universes, I don’t know what to do with infinite ethics. If there are finite levels, but the young civilizations don’t yet have a good prior over the distribution of Universe-sizes, all can reasonably think that there all levels above them, and all their decisions are correlated, so everyone bails out the inhabitants of the smaller Universes, in the hope that they get bailed out by a bigger Universe. Once they learn the correct prior over Universe-sizes, and biggest Universe realizes that no bigger Universe’s actions correlate with theirs, all of this fails (though they can still bail each other out from charity). But this is similar to the previous case, where once the civilizations learn their competence level, the most competent ones are no longer incentivized to enter into insurance contracts, but the hope is that in a sense they enter into a contract while they are still behind the veil of ignorance.
    - ryan_greenblatt 30 Sep 2024 22:10 UTC
      2 points
      0
      Parent
      Hmm, maybe I misunderstood your point. I thought you were talking about using simulations to anthropically capture AIs. As in, creating more observer moments where AIs take over less competent civilizations but are actually in a simulation run by us.
      
      If you’re happy to replace “simulation” with “prediction in a way that doesn’t create observer moments” and think the argument goes through either way then I think I agree.
      
      I agree that paying out to less competent civilizations if we find out we’re competent and avoid takeover might be what you should do (as part of a post-hoc insurance deal via UDT or as part of a commitment or whatever). As in, this would help avoid getting killed if you ended up being a less competent civilization.
      
      The smaller thing won’t work exactly for getting us bailed out. I think infinite ethics should be resolvable and end up getting resolved with something roughly similar to some notion of reality-fluid and this implies that you just have to pay more for higher measure places. (Of course people might disagree about the measure etc.)
      - David Matolcsi 1 Oct 2024 0:57 UTC
        1 point
        0
        Parent
        I’m happy to replace “simulation” with “prediction in a way that doesn’t create observer moments” if we assume we are dealing with UDT agents (which I’m unsure about) and that it’s possible to run accurate predictions about the decisions of complex agents without creating observer moments (which I’m also unsure about). I think running simulations, by some meaning of “simulation” is not really more expensive than getting the accurate predictions, and he cost of running the sims is likely small compared to the size of the payment anyway. So I like talking about running sims, in case we get an AI that takes sims more seriously than prediction-based acausal trade, but I try to pay attention that all my proposals make sense from the perspective of a UDT agent too with predictions instead of simulations. (Exception is the Can we get more than this? proposal which relies on the AI not being UDT, and I agree it’s likely to fail for various reasons, but I decided it was still worth including in the post, in case we get an AI for which this actually works, which I still don’t find that extremely unlikely.)