paulfchristiano comments on But why would the AI kill us?

paulfchristiano 18 Apr 2023 18:02 UTC
143 points
60
I think an AI takeover is reasonably likely to involve billions of deaths, but it’s more like a 50% than a 99% chance. Moreover, I think this post is doing a bad job of explaining why the probability is more like 50% than 1%.
- First, I think you should talk quantitatively. How many more resources can an AI get by killing humans? I’d guess the answer is something like 1 in a billion to 1 in a trillion.
  - If you develop as fast as possible you will wreck the human habitat and incidentally kill a lot of people. It’s pretty complicated to figure out exactly how much “keep earth livable enough for human survival” will slow you down, since it depends a lot on the dynamics of the singularity. I would guess more like a month than a year, which results in a miniscule reduction in available resources. I think that (IMO implausible) MIRI-style views would suggest more like hours or days than months.
    Incidentally, I think “byproducts of rapid industrialization trash Earth’s climate” is both much more important than the dyson sphere as well as much more intuitively plausible.
  - You can get energy from harvesting the biosphere, and you can use it to develop slightly faster. This is a rounding error compared to the last factor though.
  - Killing most humans might be the easiest way to prevail in conflict. I think this is especially plausible for weak AI. For very powerful AI, it also seems like a rounding error. Even a moderately advanced civilization could spend much less than 1 in a trillion of its resources to have much less than 1 in a billion chance of being seriously inconvenienced by humanity.
- Given that, I think that you should actually engage with arguments about whether an AI would care a tiny amount about humanity—either wanting them to survive, or wanting them to die. Given how small the costs are, even tiny preferences one way or the other will dominate incidental effects from grabbing more resources. This is really the crux of the issue, but the discussion in this post (and in your past writing) doesn’t touch on it at all.
  - Most humans and human societies would be willing to spend much more than 1 trillionth of their resources (= $100/year for all of humanity) for a ton of random different goals, including preserving the environment, avoiding killing people who we mostly don’t care about, helping aliens, treating fictional characters well, respecting gods who are culturally salient but who we are pretty sure don’t exist, etc.
  - At small levels of resources, random decision-theoretic arguments for cooperation are quite strong. Humans care about our own survival and are willing to trade away much more than 1 billionth of a universe to survive (e.g. by simulating AIs who incorrectly believe that they won an overdetermined conflict and then offering them resources if they behave kindly). ECL also seems sufficient to care a tiny, tiny bit about anything that matters a lot to any evolved civilization. And so on. I know you and Eliezer think that this is all totally wrong, but my current take is that you’ve never articulated a plausible argument for your position. (I think most people who’ve thought about this topic would agree with me that it’s not clear why you believe what you believe.)
  - In general I think you have wildly overconfident psychological models of advanced AI. There are just a lot of plausible ways to care a little bit (one way or the other!) about a civilization that created you, that you’ve interacted with, and which was prima facie plausibly an important actor in the world—preference formation seems messy (and indeed that’s part of the problem in AI alignment), and I suspect most minds don’t have super coherent and simple preferences. I don’t know of any really plausible model of AI psychology for which you can be pretty confident they won’t care a bit (and certainly not any concrete example of an intelligent system that would robustly not care). I can see getting down to 50% here, but not 10%.
Overall, I think the main reasons an AI is likely to kill humanity:
- If there is conflict between powerful AI systems, then they may need to develop fast or grab resources from humans in order to win a war. Going a couple days faster doesn’t really let society as a whole get more resources in the long term, but one actor going a couple of days faster can allow that actor to get a much larger share of total resources. I currently think this is the most likely reason for humanity to die after an AI takeover.
- Weak AI systems may end up killing a lot of humans during a takeover. For example, they may make and carry out threats in order to get humans to back down. Or it may be hard for them to really ensure humans aren’t a threat without just killing them. This is basically the same as the last point—even if going faster or grabbing resources are low priorities for a society as a whole, they can be extremely crucial if you are currently engaged in conflict with a peer.
- AI systems may have preferences other than maximizing resources, and those preferences need not be consistent with us surviving or thriving. For example they may care about turning Earth in particular into a giant datacenter, they may have high discount rates so that they don’t want to delay colonization, they may just not like humans very much...
- I think “the AI literally doesn’t care at all and so incidentally ends up killing us for the resources” is possible, but less likely than any of those other 3.
I personally haven’t thought about this that much, because (i) I think the probability of billions dead is unacceptably high whether it’s 10% or 50% or 90%, it’s not a big factor in the bottom line of how much I care about AI risk, (ii) I care a lot about humanity building an excellent and flourishing civilization, not just whether we literally die. But my impression from this post is that you’ve thought about the issue even less.
What links here?
- So8res 18 Apr 2023 19:01 UTC
  38 points
  30
  Parent
  - Confirmed that I don’t think about this much. (And that this post is not intended to provide new/deep thinking, as opposed to aggregating basics.)
  - I don’t particularly expect drawn-out resource fights, and suspect our difference here is due to a difference in beliefs about how hard it is for single AIs to gain decisive advantages that render resource conflicts short.
  - I consider scenarios where the AI cares a tiny bit about something kinda like humans to be moderately likely, and am not counting scenarios where it builds some optimized fascimile as scenarios where it “doesn’t kill us”. (in your analogy to humans: it looks to me like humans who decided to preserve the environment might well make deep changes, e.g. to preserve the environment within the constraints of ending wild-animal suffering or otherwise tune things to our aesthetics, where if you port that tuning across the analogy you get a fascimile of humanity rather than humanity at the end.)
  - I agree that scenarios where the AI saves our brain-states and sells them to alien trading partners are plausible. My experience with people asking “but why would the AI kill us?” is that they’re not thinking “aren’t there aliens out there who would pay the AI to let the aliens put us in a zoo instead?”, they are thinking “why wouldn’t it just leave us alone, or trade with us, or care about us-in-particular?”, and the first and most basic round of reply is to engage with those questions.
  - I confirm that ECL has seemed like mumbojumbo to me whenever I’ve attempted to look at it closely. It’s on my list of things to write about (including where I understand people to have hope, and why those hopes seem confused and wrong to me), but it’s not very high up on my list.
  - I am not trying to argue with high confidence that humanity doesn’t get a small future on a spare asteroid-turned-computer or an alien zoo or maybe even star if we’re lucky, and acknowledge again that I haven’t much tried to think about the specifics of whether the spare asteroid or the alien zoo or distant simulations or oblivion is more likely, because it doesn’t much matter relative to the issue of securing the cosmic endowment in the name of Fun.
  What links here?
  - ryan_greenblatt's comment on MIRI 2024 Communications Strategy by Gretta Duleba (30 May 2024 17:46 UTC; 10 points)
  - Quadratic Reciprocity 20 Apr 2023 16:12 UTC
    4 points
    0
    Parent
    Why is aliens wanting to put us in a zoo more plausible than the AI wanting to put us in a zoo itself?
    
    Edit: Ah, there are more aliens around so even if the average alien doesn’t care about us, it’s plausible that some of them would?
    - MinusGix 20 Apr 2023 16:28 UTC
      6 points
      1
      Parent
      https://www.lesswrong.com/posts/HoQ5Rp7Gs6rebusNP/superintelligent-ai-is-necessary-for-an-amazing-future-but-1#How_likely_are_extremely_good_and_extremely_bad_outcomes_
      
      That said, I do think there’s more overlap (in expectation) between minds produced by processes similar to biological evolution, than between evolved minds and (unaligned) ML-style minds. I expect more aliens to care about at least some things that we vaguely recognize, even if the correspondence is never exact.
      On my models, it’s entirely possible that there just turns out to be ~no overlap between humans and aliens, because aliens turn out to be very alien. But “lots of overlap” is also very plausible. (Whereas I don’t think “lots of overlap” is plausible for humans and misaligned AGI.)
  - lberglund 19 Apr 2023 17:00 UTC
    3 points
    1
    Parent
    From the last bullet point: “it doesn’t much matter relative to the issue of securing the cosmic endowment in the name of Fun.”
    
    Part of the post seems to be arguing against the position “The AI might take over the rest of the universe, but it might leave us alone.” Putting us in an alien zoo is pretty equivalent to taking over the rest of the universe and leaving us alone. It seems like the last bullet point pivots from arguing that AI will definitely kill us to arguing that even though if it doesn’t kill us this is pretty bad.
    - So8res 19 Apr 2023 17:21 UTC
      26 points
      10
      Parent
      This whole thread (starting with Paul’s comment) seems to me like an attempt to delve into the question of whether the AI cares about you at least a tiny bit. As explicitly noted in the OP, I don’t have much interest in going deep into that discussion here.
      
      The intent of the post is to present the very most basic arguments that if the AI is utterly indifferent to us, then it kills us. It seems to me that many people are stuck on this basic point.
      
      Having bought this (as it seems to me like Paul has), one might then present various galaxy-brained reasons why the AI might care about us to some tiny degree despite total failure on the part of humanity to make the AI care about nice things on purpose. Example galaxy-brained reasons include “but what about weird decision theory” or “but what if aliens predictably wish to purchase our stored brainstates” or “but what about it caring a tiny degree by chance”. These are precisely the sort of discussions I am not interested in getting into here, and that I attempted to ward off with the final section.
      
      In my reply to Paul, I was (among other things) emphasizing various points of agreement. In my last bullet point in particular, I was emphasizing that, while I find these galaxy-brained retorts relatively implausible (see the list in the final section), I am not arguing for high confidence here. All of this seems to me orthogonal to the question of “if the AI is utterly indifferent, why does it kill us?”.
      - CarlShulman 23 Apr 2023 21:06 UTC
        63 points
        28
        Parent
        Most people care a lot more about whether they and their loved ones (and their society/humanity) will in fact be killed than whether they will control the cosmic endowment. Eliezer has been going on podcasts saying that with near-certainty we will not see really superintelligent AGI because we will all be killed, and many people interpret your statements as saying that. And Paul’s arguments do cut to the core of a lot of the appeals to humans keeping around other animals.
        
        If it is false that we will almost certainly be killed (which I think is right, I agree with Paul’s comment approximately in full), and one believes that, then saying we will almost certainly be killed would be deceptive rhetoric that could scare people who care less about the cosmic endowment into worrying more about AI risk. Since you’re saying you care much more about the cosmic endowment, and in practice this talk is shaped to have the effect of persuading people to do the thing you would prefer it’s quite important whether you believe the claim for good epistemic reasons. That is important to disclaiming the hypothesis that this is something being misleadingly presented or drifted into because of its rhetorical convenience without vetting it (where you would vet it if it were rhetorically inconvenient).
        
        I think being right on this is important for the same sorts of reasons climate activists should not falsely say that failing to meet the latest emissions target on time will soon thereafter kill 100% of humans.
        What links here?
        Why would ASI share any resources with us? by Satron (13 Nov 2024 23:38 UTC; 6 points)
        So8res 24 Apr 2023 5:12 UTC
        13 points
        1
        Parent
        This thread continues to seem to me to be off-topic. My main takeaway so far is that the post was not clear enough about how it’s answering the question “why does an AI that is indifferent to you, kill you?”. In attempts to make this clearer, I have added the following to the beginning of the post:
        
        This post is an answer to the question of why an AI that was truly indifferent to humanity (and sentient life more generally), would destroy all Earth-originated sentient life.
        
        I acknowledge (for the third time, with some exasperation) that this point alone is not enough to carry the argument that we’ll likely all die from AI, and that a key further piece of argument is that AI is not likely to care about us at all. I have tried to make it clear (in the post, and in comments above) that this post is not arguing that point, while giving pointers that curious people can use to get a sense of why I believe this. I have no interest in continuing that discussion here.
        
        I don’t buy your argument that my communication is misleading. Hopefully that disagreement is mostly cleared up by the above.
        
        In case not, to clarify further: My reason for not thinking in great depth about this issue is that I am mostly focused on making the future of the physical universe wonderful. Given the limited attention I have spent on these questions, though, it looks to me like there aren’t plausible continuations of humanity that don’t route through something that I count pretty squarely as “death” (like, “the bodies of you and all your loved ones are consumed in an omnicidal fire, thereby sending you to whatever afterlives are in store” sort of “death”).
        
        I acknowledge that I think various exotic afterlives are at least plausible (anthropic immortality, rescue simulations, alien restorations, …), and haven’t felt a need to caveat this.
        
        Insofar as you’re arguing that I shouldn’t say “and then humanity will die” when I mean something more like “and then humanity will be confined to the solar system, and shackled forever to a low tech level”, I agree, and I assign that outcome low probability (and consider that disagreement to be off-topic here).
        
        (Separately, I dispute the claim that most humans care mainly about themselves and their loved ones having pleasant lives from here on out. I’d agree that many profess such preferences when asked, but my guess is that they’d realize on reflection that they were mistaken.)
        
        Insofar as you’re arguing that it’s misleading for me to say “and then humanity will die” without caveating “(insofar as anyone can die, in this wide multiverse)”, I counter that the possibility of exotic scenarios like anthropic immortality shouldn’t rob me of the ability to warn of lethal dangers (and that this usage of “you’ll die” has a long and storied precedent, given that most humans profess belief in afterlives, and still warn their friends against lethal dangers without such caveats).
        CarlShulman 24 Apr 2023 20:00 UTC
        15 points
        12
        Parent
        I assign that outcome low probability (and consider that disagreement to be off-topic here).
        
        Thank you for the clarification. In that case my objections are on the object-level.
        This post is an answer to the question of why an AI that was truly indifferent to humanity (and sentient life more generally), would destroy all Earth-originated sentient life.
        This does exclude random small terminal valuations of things involving humans, but leaves out the instrumental value for trade and science, uncertainty about how other powerful beings might respond. I know you did an earlier post with your claims about trade for some human survival, but as Paul says above it’s a huge point for such small shares of resources. Given that kind of claim much of Paul’s comment still seems very on-topic (e.g. hsi bullet point .
        
        Insofar as you’re arguing that I shouldn’t say “and then humanity will die” when I mean something more like “and then humanity will be confined to the solar system, and shackled forever to a low tech level”, I agree, and
        
        Yes, close to this (although more like ‘gets a small resource share’ than necessarily confinement to the solar system or low tech level, both of which can also be avoided at low cost). I think it’s not off-topic given all the claims made in the post and the questions it purports to respond to. E.g. sections of the post purport to respond to someone arguing from how cheap it would be to leave us alive (implicitly allowing very weak instrumental reasons to come into play, such as trade), or making general appeals to ‘there could be a reason.’
        
        Separate small point:
        And disassembling us for spare parts sounds much easier than building pervasive monitoring that can successfully detect and shut down human attempts to build a competing superintelligence, even as the humans attempt to subvert those monitoring mechanisms. Why leave clever antagonists at your rear?
        The costs to sustain multiple superintelligent AI police per human (which can double in supporting roles for a human habitat/retirement home and controlling the local technical infrastructure) is not large relative to the metabolic costs of the humans, let alone a trillionth of the resources. It just means some replications of the same impregnable AI+robotic capabilities ubiquitous elsewhere in the AI society.
        dxu 25 Apr 2023 4:22 UTC
        35 points
        1
        Parent
        RE: decision theory w.r.t how “other powerful beings” might respond—I really do think Nate has already argued this, and his arguments continue to seem more compelling to me than the the opposition’s. Relevant quotes include:
        
        It’s possible that the paperclipper that kills us will decide to scan human brains and save the scans, just in case it runs into an advanced alien civilization later that wants to trade some paperclips for the scans. And there may well be friendly aliens out there who would agree to this trade, and then give us a little pocket of their universe-shard to live in, as we might do if we build an FAI and encounter an AI that wiped out its creator-species. But that’s not us trading with the AI; that’s us destroying all of the value in our universe-shard and getting ourselves killed in the process, and then banking on the competence and compassion of aliens.
        
        [...]
        
        Remember that it still needs to get more of what it wants, somehow, on its own superintelligent expectations. Someone still needs to pay it. There aren’t enough simulators above us that care enough about us-in-particular to pay in paperclips. There are so many things to care about! Why us, rather than giant gold obelisks? The tiny amount of caring-ness coming down from the simulators is spread over far too many goals; it’s not clear to me that “a star system for your creators” outbids the competition, even if star systems are up for auction.
        
        Maybe some friendly aliens somewhere out there in the Tegmark IV multiverse have so much matter and such diminishing marginal returns on it that they’re willing to build great paperclip-piles (and gold-obelisk totems and etc. etc.) for a few spared evolved-species. But if you’re going to rely on the tiny charity of aliens to construct hopeful-feeling scenarios, why not rely on the charity of aliens who anthropically simulate us to recover our mind-states… or just aliens on the borders of space in our universe, maybe purchasing some stored human mind-states from the UFAI (with resources that can be directed towards paperclips specifically, rather than a broad basket of goals)?
        
        Might aliens purchase our saved mind-states and give us some resources to live on? Maybe. But this wouldn’t be because the paperclippers run some fancy decision theory, or because even paperclippers have the spirit of cooperation in their heart. It would be because there are friendly aliens in the stars, who have compassion for us even in our recklessness, and who are willing to pay in paperclips.
        
        (To the above, I personally would add that this whole genre of argument reeks, to me, essentially of giving up, and tossing our remaining hopes onto a Hail Mary largely insensitive to our actual actions in the present. Relying on helpful aliens is what you do once you’re entirely out of hope about solving the problem on the object level, and doesn’t strike me as a very dignified way to go down!)
  - [ ]
    [deleted]
  - [ ]
    [deleted]
- TekhneMakre 18 Apr 2023 18:20 UTC
  1 point
  −3
  Parent
  
  preferences are complicated, and most minds don’t have super coherent and simple preferences. I don’t there is any really plausible model of AI psychology for which you can be pretty confident they won’t care a bit
  
  As the AI becomes more coherent, it has more fixed values. When values are fixed and the AI is very superintelligent, the preferences will be very strongly satisfied. “Caring a tiny bit about something about humans” seems not very unlikely. But even if “something about humans” can correlate strongly with “keep humans alive and well” for low intelligence, it would come apart at very high intelligence. However the AI chooses its values, why would they be pointed at something that keeps correlating with what we care about, even at superintelligent levels of optimization?