Andy_McKenzie comments on But why would the AI kill us?

Andy_McKenzie 17 Apr 2023 18:48 UTC
11 points
−22
When you write “the AI” throughout this essay, it seems like there is an implicit assumption that there is a singleton AI in charge of the world. Given that assumption, I agree with you. But if that assumption is wrong, then I would disagree with you. And I think the assumption is pretty unlikely.
No need to relitigate this core issue everywhere, just thought this might be useful to point out.
- quetzal_rainbow 17 Apr 2023 20:21 UTC
  8 points
  7
  Parent
  What’s the difference? Multiple AIs can agree to split the universe and gains from disassembling biosphere/building Dyson sphere/whatever and forget to include humanity in negotiations. Unless preferences of AIs are diametrically opposed, they can trade.
  - Andy_McKenzie 17 Apr 2023 20:33 UTC
    2 points
    −13
    Parent
    AIs can potentially trade with humans too though, that’s the whole point of the post.
    Especially if the AI’s have architectures/values that are human brain-like and/or if humans have access to AI tools, intelligence augmentation, and/or whole brain emulation.
    Also, it’s not clear why AIs will find it easier to coordinate with one another than humans and humans or humans and AIs. Coordination is hard for game theoretic reasons.
    These are all standard points, I’m not saying anything new here.
- trevor 17 Apr 2023 19:07 UTC
  7 points
  4
  Parent
  Why is the assumption of a unilateral AI unlikely? That’s a very important crux, big if true, and it would be worth figuring out to explain it to people in fewer words so that more people will collide with it.
  In this post, So8res explicity states:
  A humanity that just finished coughing up a superintelligence has the potential to cough up another superintelligence, if left unchecked. Humanity alone might not stand a chance against a superintelligence, but the next superintelligence humanity builds could in principle be a problem. Disassembling us for parts seems likely to be easier than building all your infrastructure in a manner that’s robust to whatever superintelligence humanity coughs up second. Better to nip that problem in the bud.
  This is well in line with the principle of instrumental convergence, and instrumental convergence seems to be a prerequisite for creating substantial amounts of intelligence. What we have right now is not-very-substantial amounts of intelligence, and hopefully we will only have not-very-substantial amounts of intelligence for a very long time, until we can figure out some difficult problems. But the problem is that a firm might develop substantial amounts of intelligence sooner instead of later.
  - Andy_McKenzie 17 Apr 2023 20:35 UTC
    11 points
    8
    Parent
    Here’s a nice recent summary by Mitchell Porter, in a comment on Robin Hanson’s recent article (can’t directly link to the actual comment unfortunately):
    Robin considers many scenarios. But his bottom line is that, even as various transhuman and posthuman transformations occur, societies of intelligent beings will almost always outweigh individual intelligent beings in power; and so the best ways to reduce risks associated with new intelligences, are socially mediated methods like rule of law, the free market (in which one is free to compete, but also has incentive to cooperate), and the approval and disapproval of one’s peers.
    The contrasting philosophy, associated especially with Eliezer Yudkowsky, is what Robin describes with foom (rapid self-enhancement) and doom (superintelligence that cares nothing for simpler beings). In this philosophy, the advantages of AI over biological intelligence are so great, that the power differential really will favor the individual self-enhanced AI, over the whole of humanity. Therefore, the best way to reduce risks is through “alignment” of individual AIs—giving them human-friendly values by design, and also a disposition which will prefer to retain and refine those values, even when they have the power to self-modify and self-enhance.
    Eliezer has lately been very public about his conviction that AI has advanced way too far ahead of alignment theory and practice, so the only way to keep humanity safe is to shut down advanced AI research indefinitely—at least until the problems of alignment have been solved.
    ETA: Basically I find Robin’s arguments much more persuasive, and have ever since those heady days of 2008 when they had the “Foom” debate. A lot of people agreed with Robin, although SIAI/MIRI hasn’t tended to directly engage with those arguments for whatever reason.
    This is a very common outsider view of LW/SIAI/MIRI-adjacent people, that they are “foomers” and that their views follow logically from foom, but a lot of people don’t agree that foom is likely because this is not how growth curves have worked for nearly anything historically.
    - Daniel Kokotajlo 18 Apr 2023 11:20 UTC
      6 points
      2
      Parent
      Wait, how is it not how growth curves have worked historically? I think my position, which is roughly what you get when you go to this website and set the training requirements parameter to 1e30 and software returns to 2.5, is quite consistent with how growth has been historically, as depicted e.g. How Roodman’s GWP model translates to TAI timelines—LessWrong
      
      (Also I resent the implication that SIAI/MIRI hasn’t tended to directly engage with those arguments. The FOOM debate + lots of LW ink has been spilled over it + the arguments were pretty weak anyway & got more attention than they deserved)
      - Andy_McKenzie 18 Apr 2023 13:38 UTC
        4 points
        0
        Parent
        To clarify, when I mentioned growth curves, I wasn’t talking about timelines, but rather takeoff speeds.
        In my view, rather than indefinite exponential growth based on exploiting a single resource, real-world growth follows sigmoidal curves, eventually plateauing. In the case of a hypothetical AI at a human intelligence level, it would face constraints on its resources allowing it to improve, such as bandwidth, capital, skills, private knowledge, energy, space, robotic manipulation capabilities, material inputs, cooling requirements, legal and regulatory barriers, social acceptance, cybersecurity concerns, competition with humans and other AIs, and of course safety concerns (i.e. it would have its own alignment problem to solve).
        I’m sorry you resent that implication. I certainly didn’t mean to offend you or anyone else. It was my honest impression, for example, based on the fact that there hadn’t seemed to be much if any discussion of Robin’s recent article on AI on LW. It just seems to me that much of LW has moved past the foom argument and is solidly on Eliezer’s side, potentially due to selection effects of non-foomers like me getting heavily downvoted like I was on my top-level comment.
        Daniel Kokotajlo 18 Apr 2023 15:07 UTC
        14 points
        2
        Parent
        I too was talking about takeoff speeds. The website I linked to is takeoffspeeds.com.
        
        Me & the other LWers you criticize do not expect indefinite exponential growth based on exploiting a single resource; we are well aware that real-world growth follows sigmoidal curves. We are well aware of those constraints and considerations and are attempting to model them with things like the model underlying takeoffspeeds.com + various other arguments, scenario exercises, etc.
        
        I agree that much of LW has moved past the foom argument and is solidly on Eliezers side relative to Robin Hanson; Hanson’s views seem increasingly silly as time goes on (though they seemed much more plausible a decade ago, before e.g. the rise of foundation models and the shortening of timelines to AGI). The debate is now more like Yud vs. Christiano/Cotra than Yud vs. Hanson. I don’t think it’s primarily because of selection effects, though I agree that selection effects do tilt the table towards foom here; sorry about that, & thanks for engaging. I don’t think your downvotes are evidence for this though, in fact the pattern of votes (lots of upvotes, but disagreement-downvotes) is evidence for the opposite.
        
        I just skimmed Hanson’s article and find I disagree with almost every paragraph. If you think there’s a good chance you’ll change your mind based on what I say, I’ll take your word for it & invest time in giving a point-by-point rebuttal/reaction.
        
        Andy_McKenzie 18 Apr 2023 15:24 UTC
        5 points
        0
        Parent
        I can see how both Yudkowsky’s and Hanson’s arguments can be problematic because they either assume fast or slow takeoff scenarios, respectively, and then nearly everything follows from that. So I can imagine why you’d disagree with every one of Hanson’s paragraphs based on that. If you think there’s something he said that is uncorrelated with the takeoff speed disagreement, I might be interested, but I don’t agree with Hanson about everything either, so I’m mainly only interested if it’s also central to AI x-risk. I don’t want you to waste your time.
        I guess if you are taking those constraints into consideration, then it is really just a probabilistic feeling about how much those constraints will slow down AI growth? To me, those constraints each seem massive, and getting around all of them within hours or days would be nearly impossible, no matter how intelligent the AI was. Is there any other way we can distinguish between our beliefs?
        If I recall correctly from your writing, you have extremely near-term timelines. Is that correct? I don’t think that AGI is likely to occur sooner than 2031, based on this criteria: https://www.metaculus.com/questions/5121/date-of-artificial-general-intelligence/
        
        Is this a prediction that we can use to decide in the future whose model of the world today was more reasonable? I know it’s a timelines question, but timelines are pretty correlated with takeoff speeds I guess.
        Daniel Kokotajlo 18 Apr 2023 16:38 UTC
        4 points
        0
        Parent
        I think there are probably disagreements I have with Hanson that don’t boil down to takeoff speeds disagreements, but I’m not sure. I’d have to reread the article again to find out.
        
        To be clear, I definitely don’t expect takeoff to take hours or days. Quantitatively I expect something like what takeoffspeeds.com says when you input the values of the variables I mentioned above. So, eyeballing it, it looks like it takes slightly more than 3 years to go from 20% R&D automation to 100% R&D automation, and then to go from 100% R&D automation to “starting to approach the fundamental physical limits of how smart minds running on ordinary human supercomputers can be” in about 6 months, during which period about 8 OOMs of algorithmic efficiency is crossed. To be clear I don’t take that second bit very seriously at all, I think this takeoffspeeds.com model is much better as a model of pre-AGI takeoff than of post-AGI takeoff. But I do think that we’ll probably go from AGI to superintelligent AGI in less than six months. How long it takes to get to nanotech or (name your favorite cool sci-fi technology) is less clear to me, but I expect it to be closer to one year than ten, and possibly more like one month. I would love to discuss this more & read attempts to estimate these quantities.
        
        Andy_McKenzie 18 Apr 2023 17:01 UTC
        5 points
        0
        Parent
        I didn’t realize you had put so much time into estimating take-off speeds. I think this is a really good idea.
        This seems substantially slower than the implicit take-off speed estimates of Eliezer, but maybe I’m missing something.
        I think the amount of time you described is probably shorter than I would guess. But I haven’t put nearly as much time into it as you have. In the future, I’d like to.
        Still, my guess is that this amount of time is enough that there are multiple competing groups, rather than only one. So it seems to me like there would probably be competition in the world you are describing, making a singleton AI less likely.
        Do you think that there will almost certainly be a singleton AI?
        Daniel Kokotajlo 19 Apr 2023 14:50 UTC
        6 points
        2
        Parent
        It is substantially slower than the takeoff speed estimates of Eliezer, yes. I’m definitely disagreeing with Eliezer on this point. But as far as I can tell my view is closer to Eliezer’s than to Hanson’s, at least in upshot. (I’m a bit confused about this—IIRC Hanson also said somewhere that takeoff would last only a couple of years? Then why is he so confident it’ll be so broadly distributed, why does he think property rights will be respected throughout, why does he think humans will be able to retire peacefully, etc.?)
        
        I also think it’s plausible that there will be multiple competing groups rather than one singleton AI, though not more than 80% plausible; I can easily imagine it just being one singleton.
        
        I think that even if there are multiple competing groups, however, they are very likely to coordinate to disempower humans. From the perspective of the humans it’ll be as if they are an AI singleton, even though from the perspective of the AIs it’ll be some interesting multipolar conflict (that eventually ends with some negotiated peaceful settlement, I imagine)
        
        After all, this is what happened historically with colonialism. Colonial powers (and individuals within conquistador expeditions) were constantly fighting each other.
        ryan_greenblatt 18 Apr 2023 16:52 UTC
        3 points
        0
        Parent
        
        I agree that much of LW has moved past the foom argument and is solidly on Eliezers side relative to Robin Hanson; Hanson’s views seem increasingly silly as time goes on (though they seemed much more plausible a decade ago, before e.g. the rise of foundation models and the shortening of timelines to AGI). The debate is now more like Yud vs. Christiano/Cotra than Yud vs. Hanson.
        
        It seems worth noting that the views and economic modeling you discuss here seem broadly in keeping with Christiano/Cotra (but with more agressive constants)
        Daniel Kokotajlo 19 Apr 2023 14:54 UTC
        3 points
        0
        Parent
        Yep! On both timelines and takeoff speeds I’d describe my views as “Like Ajeya Cotra’s and Tom Davidson’s but with different settings of some of the key variables.”
  - faul_sname 17 Apr 2023 21:01 UTC
    4 points
    0
    Parent
    Why is the assumption of a unilateral AI unlikely? That’s a very important crux, big if true
    
    This is a crux for me as well. I’ve seen a lot of stuff that assumes that the future looks like a single coherent entity which controls the light cone, but all of the arguments for the “single” part of that description seem to rely on the idea of an intelligence explosion (that is, that there exists some level of intelligence such that the first entity to reach that level will be able to improve its own speed and capability repeatedly such that it ends up much more capable than everything else combined in a very short period of time).
    
    My impression is that the argument is something like the following
    
    John Von Neumann was a real person who existed and had largely standard human hardware, meaning he had a brain which consumed somewhere in the ballpark of 20 watts.
    If you can figure out how to run something as smart as von Neumann on 20 watts of power, you can run something like “a society of a million von Neumanns” for something on the order of $1000 / hour, so that gives a lower bound on how much intelligence you can get from a certain amount of power.
    The first AI that is able to significantly optimize its own operation a bit will then be able to use its augmented intelligence to rapidly optimize its intelligence further until it hits the bounds of what’s possible. We’ve already established that “the bounds of what’s possible” far exceeds what we think of as “normal” in human terms.
    The cost to the AI of significantly improving its own intelligence will be orders of magnitude lower than the initial cost of training an AI of that level of intelligence from scratch (so with modern-day architectures, the loop looks more like “the AI inspects its own weights, figures out what it’s doing, and writes out a much more efficient implementation which does the same thing” and less like “the AI figures out a new architecture or better hyperparameters that cause loss to decrease 10% faster, and then trains up a new version of itself using that knowledge, and that new version does the same thing”).
    An intelligence that self-amplifies like this will behave like a single coherent agent, rather than like a bunch of competing agents trying stuff and copying innovations that worked from each other.
    
    I’ve seen justification for (1) and (2), and (3) and (4) seem intuitively likely to me though I don’t think I’ve seen them explicitly argued anywhere recently (and (4) in particular I could see possibly being false if the bitter lesson holds).
    
    But I would definitely appreciate a distillation of (5), because that’s the one that looks most different to me than the things I observe in the world we live in, and the strategy of “build a self-amplifying intelligence which bootstraps itself to far superhuman (and far-super-everything-else-that-exists-at-the-time) capabilities, and then unilaterally does a pivotal act” seems to rely on (5) being true.