Steven Byrnes comments on Sam Altman’s Chip Ambitions Undercut OpenAI’s Safety Strategy

Steven Byrnes 12 Feb 2024 18:11 UTC
12 points
5
A fast takeoff occurs later in time than a slow takeoff because a slow takeoff involves gradual acceleration over a longer period of time, whereas a slow takeoff involves sudden takeoff over a shorter period of time.
I think you’re confused. If you hold the “singularity date” / “takeoff end time” fixed—e.g., you say that we somehow know for certain that “the singularity” will happen on August 29, 2047—then the later takeoff starts, the faster it is. I think that’s what you have in mind, right? (Yes I know that the singularity is not literally a specific day, but hopefully you get what I’m saying).
But that’s irrelevant. It’s not what we’re talking about. I.e., it is not true that the singularity will happen at a fixed, exogenous date. There are possible interventions that could make that date earlier or later.
Anyway, I understand your comment as suggesting: “As long as the singularity hasn’t happened yet, any intervention that would accelerate AI progress, such as Sam Altman’s trying to make more AI chips in the next 10ish years, pushes us in the direction of slow takeoff, not fast takeoff.” Am I understanding you correctly? If so, then I disagree because it’s possible to accelerate AI progress in a way that brings the “takeoff end time” sooner without affecting the “takeoff start time” (using the terminology of that post you linked), or at least in a way that brings the end time sooner more than it brings the start time sooner.
You see what I mean?
- Logan Zoellner 13 Feb 2024 13:02 UTC
  8 points
  0
  Parent
  I think you’re confused. If you hold the “singularity date” / “takeoff end time” fixed—e.g., you say that we somehow know for certain that “the singularity” will happen on August 29, 2047—then the later takeoff starts, the faster it is. I think that’s what you have in mind, right?
  That’s not what I have in mind at all.
  I have picture like this in mind when I think of slow/fast takeoff:
  
  There are 3 types of ways we can accelerate AGI progress:
  1. Moore’s law (I consider this one more or less inevitable)
  As we make progress in the material sciences, the cost of compute goes down
  
  2. Spending
  If we spend more money on compute, we can build AGI sooner
  
  3. Algorithmic progress
  
  The more AI we build, the better we get at building AI
  
  Assuming we don’t plan to ban science altogether (and hence do not change Moore’s law), then pausing AI research (by banning large training runs) inevitably leads to a period of catch-up growth when the pause ends.
  
  I think that the line marked “slow takeoff” is safer because:
  1. the period of rapid catch up growth seems the most dangerous
  2. we spend more time near the top of the curve where AI safety research is the most productive
  I suppose that if you could pause exactly below the red line marked “dangerous capabilities threshold”, that would be even safer. But since we don’t know where that line is I don’t really believe that to be possible. The closest approximation is Anthropic’s RSP or Open AI’s early warning system which says “if we notice we’ve already crossed the red line, then we should definitely pause”.
  - Steven Byrnes 13 Feb 2024 13:31 UTC
    9 points
    4
    Parent
    I’m confused about how “pausing AI research (by banning large training runs)” is related to this conversation. The OP doesn’t even mention that, nor do any of the other comments, as far as I can see. The topic of discussion here on this page is a move by Sam Altman to raise tons of money and build tons of chip fabs. Right?
    If you weren’t talking about “Sam Altman’s chip ambitions” previously, then, well, I propose that we start talking about it now. So: How do you think that particular move—the move by Sam Altman to raise tons of money and build tons of chip fabs—would affect (or not affect) takeoff speeds, and why do you think that? A.k.a. how would you answer the question I asked here?
    What links here?
    Response to Dileep George: AGI safety warrants planning ahead by Steven Byrnes (8 Jul 2024 15:27 UTC; 27 points)
    - Logan Zoellner 14 Feb 2024 18:04 UTC
      2 points
      −9
      Parent
      I think that Sam’s actions increase the likelihood of a slow takeoff.
      
      Consider Paul’s description of a slow takeoff from the original takeoff speed debate
      right now I think hardware R&D is on the order of $100B/year, AI R&D is more like $10B/year, I guess I’m betting on something more like trillions? (limited from going higher because of accounting problems and not that much smart money)
      - Steven Byrnes 14 Feb 2024 18:31 UTC
        5 points
        1
        Parent
        Can you explain why? If that Paul excerpt is supposed to be an explanation, then I don’t follow it.
        You previously linked this post by Hadshar & Lintz which is very explicit that the more chips there are in the world, the faster takeoff we should expect. (E.g. “slow, continuous takeoff is more likely given short timelines […partly because…] It’s plausible that compute overhang is low now relative to later, and this tends towards slower, more continuous takeoff.”) Do you think Hadshar & Lintz are incorrect on this point, or do you think that I am mischaracterizing their beliefs, or something else?
        Logan Zoellner 14 Feb 2024 19:06 UTC
        3 points
        −13
        Parent
        My understanding is that the fundamental disagreement is over whether there will be a “sharp discontinuity” at the development of AGI.
        
        In Paul’s model, there is no sharp discontinuity. So, since we expect AGI to have a large economic impact, we expect “almost AGI” to have an “almost large” economic impact (which he describes as being trillions of dollars).
        
        One way to think of this is to ask: will economic growth suddenly jump on the day AGI is invented? Paul think’s ‘no’ and EY thinks ‘yes’.
        
        Since sudden discontinuities are generally dangerous, a slow (continuous) takeoff is generally thought of as safer, even though the rapid economic growth prior to AGI results in AGI happen sooner.
        
        This also effects the “kind of world” that AGI enters. In a world where pre-AGI is not widely deployed, the first AGI has a large “compute advantage” versus the surrounding environment. But in a world where pre-AGI is already quite powerful (imagine everyone has a team of AI agent that handles their financial transactions, protects them from cyber threats, is researching the cutting edge of physics/biology/nanotechnology, etc), there is less free-energy so to speak for the first AGI to take adavantage of.
        
        Most AI Foom stories involve the first AGI rapidly acquiring power (via nanomachines or making computers out of dna or some other new technology path). But if pre-AGI AIs are already exploring these pathways, there are fewer “exploits” for the AGI to discover and rapidly gain power relative to what already exists.
        
        edit:
        
        I feel like I didn’t sufficiently address the question of compute overhang. Just as a “compute overhang” is obviously dangerous, so is an “advanced fab” overhang or a “nanotechnology” overhang”, so pushing all of the tech-tree ahead enhances our safety.
        Steven Byrnes 14 Feb 2024 20:08 UTC
        6 points
        2
        Parent
        You’re saying a lot of things that seem very confused and irrelevant to me, and I’m trying to get to the bottom of where you’re coming from.
        Here’s a key question, I think: In this comment, you drew a diagram with a dashed line labeled “capabilities ceiling”. What do you think determines the capabilities ceiling? E.g. what hypothetical real-world interventions would make that dashed line move left or right, or to get steeper or shallower?
        In other words, I hope you’ll agree that you can’t simultaneously believe that every possible intervention that makes AGI happen sooner will push us towards slow takeoff. That would be internally-inconsistent, right? As a thought experiment, suppose all tech and economic and intellectual progress of the next century magically happened overnight tonight. Then we would have extremely powerful AGI tomorrow, right? And that is a very fast takeoff, i.e. one day.
        Conversely, I hope you’ll agree that you can’t simultaneously believe that every possible intervention that makes AGI happen later will push us towards fast takeoff. Again, that would be internally-inconsistent, right? As a thought experiment, suppose every human on Earth simultaneously hibernated for 20 years starting right now. And then the entire human race wakes up in 2044, and we pick right back up where we were. That wouldn’t make a bit of difference to takeoff speed—the takeoff would be exactly as fast or slow if the hibernation happens as if it doesn’t. Right? (Well, that’s assuming that takeoff hasn’t already started, I guess, but if it has, then the hibernation would technically make takeoff slower not faster, right?)
        If you agree with those two thought experiments, then you need to have in mind some bottleneck sitting between us and dangerous AGI, i.e. the “capabilities ceiling” dashed line. If there is such a bottleneck, then we can and hopefully would accelerate everything except that bottleneck, and we won’t get all the way to dangerous AGI until that bottleneck goes away, which (we hope) will take a long time, presumably because of the particular nature of that bottleneck. Most people in the prosaic AGI camp (including the slightly-younger Sam Altman I guess) think that the finite number of chips in the world is either the entirety of this bottleneck, or at least a major part of it, and that therefore trying to alleviate this bottleneck ASAP is the last thing you want to do in order to get maximally slow takeoff. If you disagree with that, then you presumably are expecting a different bottleneck besides chips, and I’d like to know what it is, and how you know.
        Logan Zoellner 15 Feb 2024 10:07 UTC
        2 points
        −10
        Parent
        idk.
        
        Maybe I got carried away with the whole “everything overhang” idea.
        
        While I do think fast vs slow takeoff is an important variable that determines how safe a singularity is, it’s far from the only thing that matters.
        
        If you were looking at our world today and asking “what obvious inefficiencies will an AGI exploit?” there are probably a lot of lower-hanging fruits (nuclear power, genetic engineering, zoning) that you would point to before getting to “we’re not building chip fabs as fast as physically possible”.
        
        My actual views are probably closest to d/acc which is that there are a wide variety of directions we can chose when researching new technology and we ought to focus on the ones that make the world safer.
        
        I do think that creating new obvious inefficiencies is a bad idea. For example, if we were to sustain a cap of 10**26 FLOPs on training runs for a decade or longer, that would make it really easy for a rouge actor/ai to suddenly build a much more powerful AI than anyone else in the world has.
        
        As to the specific case of Sam/$7T, I think that it’s largely aspiration and to the extent that it happens it was going to happen anyway. I guess if I was given a specific counterfactual, like: TSMC is going to build 100 new fabs in the next 10 years, is it better that they be built in the USA or Taiwan? I would prefer they be built in the USA. If, on the other hand, the counterfactual was: the USA is going to invest $7T in AI in the next 10 years, would you prefer it be all spent on semiconductor fabs or half on semiconductor fabs and half on researching controllable AI algorithms, I would prefer the latter.
        
        Basically, my views are “don’t be an idiot”, but it’s possible to be an idiot both by arbitrarily banning things and by focusing on a single line of research to the exclusion of all others.
        Gerald Monroe 14 Feb 2024 20:27 UTC
        0 points
        −2
        Parent
        I think you’re confused. A rocket launch is not the same thing as filling the rocket fuel tanks with high explosives. Even when you give the rocket bigger engines. Launching a roller coaster harder is not the same as dropping someone to the ground. Avalanche isn’t the same thing as a high speed train moving the ice.
        
        The difference between these cases is the sudden all at once jolt, or impulse, is fatal. While most acceleration events up to a limit are not.
        
        So in the AI case, situations where there are a lot of idle chips powerful enough to host ASI piled up everywhere plugged into infrastucture like the explosives/fall/avalanche. You may note the system has a lot of potential energy unused that is suddenly released.
        
        While if humans start running their fabs faster, say 10-1000 times faster, and every newly made IC goes into an AI cluster giving immediate feedback to the developers, then to an extent this is a continuous process. You might be confident that it is simply not possible to create ICs fast enough, assuming they all immediately go into AI and humans immediately get genuine feedback on failures and safety, to cause a fast takeoff.
        
        And I think this is the case, assuming humans isolate each cluster and refrain from giving their early production AIs some mechanism to communicate with each other in an unstructured way, and/or a way to remember past a few hours.
        
        Failures are fine, AI screwing up and killing people is just fine (so long as the risks are less than humans doing the task). What’s not fine is a coordinated failure.
        
        The converse view might be you can’t try to change faster than some rate limit decided by human governments. The legal system isn’t ready for AGI, the economy and medical system isn’t, no precautions whatsoever are legally required, the economy will naturally put all the power into the hands of a few tech companies if this goes forward. That democracies simply can’t adjust fast enough because most voters won’t understand any of the issues and keep voting for incumbents, dictatorships have essentially the same problem.
        Steven Byrnes 14 Feb 2024 20:40 UTC
        4 points
        6
        Parent
        Whether Eliezer or Paul is right about “sudden all at once jolt” is an interesting question but I don’t understand how that question is related to Sam Altman’s chip ambitions. I don’t understand why Logan keeps bringing that up, and now I guess you’re doing it too. Is the idea that “sudden all at once jolt” is less likely in a world with more chips and chip fabs, and more likely in a world with fewer chips and chip fabs? If so, why? I would expect that if the extra chips make any difference at all, it would be to push things in the opposite direction.
        In other words, if “situations where there are a lot of chips piled up everywhere plugged into infrastucture” is the bad thing that we’re trying to avoid, then a good way to help avoid that is to NOT manufacture tons and tons of extra chips, right?
        Gerald Monroe 14 Feb 2024 21:01 UTC
        1 point
        −3
        Parent
        Is the idea that “sudden all at once jolt” is less likely in a world with more chips and chip fabs, and more likely in a world with fewer chips and chip fabs? If so, why? I would expect that if the extra chips make any difference at all, it would be to push things in the opposite direction.
        We’re changing 2 variables, not 1:
        (1) We know how to make useful AI, and these are AI accelerator chips specifically meant for specific network architectures
        (2) we built a lot of them
        Pretend base world:
        (1) OAI doesn’t exist or folds like most startups. Deepmind went from a 40% staff cut to 100% and joins the graveyard.
        (2) Moore’s law continues, and various kinds of general accelerator keep getting printed
        So in the “pretend base world”, 2 things are true:
        (1) AI is possible, just human investors were too stupid to pay for it
        (2) each 2-3 years, the cost of compute is halving
        Suppose for 20 more years the base world continues, with human investors preferring to invest in various pyramid schemes instead of AI. (real estate, crypto...). Then after 20 years, compute is 256 times cheaper. This “7 trillion” investment is now 27 billion, pocket change. Also various inefficient specialized neural networks (“narrow AI”) are used in places, with lots of support hardware plugged into real infrastructure.
        That world has a compute overhang, and since compute is so cheap, someone will eventually try to train a now small neural network with 20 trillion weights on some random stuff they downloaded and you know the rest.
        What’s different in the real world: Each accelerator built in practice will be specifically for probably a large transformer network, specifically with fp16 or less precision. Each one printed is then racked into a cluster specifically with a finite bandwidth backplane intended to support the scale of network that is in commercial use.
        Key differences :
        (1) the expensive hardware has to be specialized, instead of general purpose
        (2) specializations mean it is likely impossible for any of the newly made accelerators to support a different/much larger architecture. This means is is very unlikely to be able to run an ASI, unless that ASI happens to function with a transformers-like architecture of similar scale on fp16, or can somehow function as a distributed system with low bandwidth between clusters.
        (3) Human developers get immediate feedback as they scale up their networks. If there are actual safety concerns of the type that MIRI et al has speculated exist, they may be found before the hardware can support ASI. This is what makes it a continuous process.
        Note the epistemics : I work on an accelerator platform. I can say with confidence that accelerator design does limit what networks it is viable to accelerate, tops are not fungible.
        Conclusion: if you think AI during your lifetime is bad, you’re probably going to see this as a bad thing. Whether it is actually a bad thing is complicated.
        Steven Byrnes 15 Feb 2024 0:40 UTC
        6 points
        2
        Parent
        I’m confused about your “pretend base world”. This isn’t a discussion about whether it’s good or bad that OAI exists. It’s a discussion about “Sam Altman’s chip ambitions”. So we should compare the world where OAI seems to be doing quite well and Sam Altman has no chip ambitions at all, to the world where OAI seems to be doing quite well and Sam Altman does have chip ambitions. Right?
        I agree that if we’re worried about FOOM-from-a-paradigm-shifting-algorithmic-breakthrough (which as it turns out I am indeed worried about), then we would prefer be in a world where there is a low absolute number of chips that are flexible enough to run a wide variety of algorithms, than a world where there are a large number of such chips. But I disagree that this would be the effect of Sam Altman’s chip ambitions; rather, I think Sam Altman’s chip ambitions would clearly move things in the opposite, bad direction, on that metric. Don’t you think?
        By analogy, suppose I say “(1) It’s very important to minimize the number of red cars in existence. (2) Hey, there’s a massively hot upcoming specialized market for blue cars, so let’s build 100 massive car factories all around the world.” You would agree that (2) is moving things in the wrong direction for accomplishing (1), right?
        This seems obvious to me, but if not, I’ll spell out a couple reasons:
        For one thing, who’s to say that the new car factories won’t sell into the red-car market too? Back to the case at hand: we should strongly presume that whatever fabs get built by this Sam Altman initiative will make not exclusively ultra-specialized AI chips, but rather they will make whatever kinds of chips are most profitable to make, and this might include some less-specialized chips. After all, whoever invests in the fab, once they build it, they will try to maximize revenue, to make back the insane amount of money they put in, right? And fabs are flexible enough to make more than one kind of chip, especially over the long term.
        For another thing, even if the new car factories don’t directly produce red cars, they will still lower the price of red cars, compared to the factories not existing, because the old car factories will produce extra marginal red cars when they would otherwise be producing blue cars. Back to the case at hand: the non-Sam-Altman fabs will choose to pump out more non-ultra-specialized chips if Sam-Altman fabs are flooding the specialized-chips market. Also, in the longer term, fab suppliers will be able to lower costs across the industry (from both economies-of-scale and having more money for R&D towards process improvements) if they have more fabs to sell to, and this would make it economical for non-Sam-Altman fabs to produce and sell more non-ultra-specialized chips.
        Expand this thread
        [ ]
        [deleted]
        Amalthea 14 Feb 2024 19:55 UTC
        1 point
        0
        Parent
        I think the possibility of compute overhang seems plausible given the technological realities, but generalizing from this to a second-order overhang etc. seems taking it too far.
        
        If there is an argument that we should push compute due to the danger of another “overhang” down the line that should be made explicitly and not by generalisation from one (debatable!) example.
        [ ]
        [deleted]
- Gerald Monroe 12 Feb 2024 23:12 UTC
  4 points
  0
  Parent
  Slow takeoff : humans get pre AGI in the near future and then AGI. The Singularity is slow if humans have AGI in the near future if we assume that the price per transistor is expensive, and we assume that each inference instance of a running AGI is only marginally faster than humans and requires say $3.2 million USD in hardware. (If gpt-4 needs 128 H100s at inference time, albeit this hosts multiple parallel generation sessions). This is also why the AGI->ASI transition is slow. If you model it as an “intelligent search”, where merely thousands of AGI scale training runs are done to find an architecture that scales to ASI, then this requires 10^3 times as much hardware as getting to AGI. If that’s $100 billion, then ASI costs 100 trillion.
  
  Fast Takeoff: humans are stuck or don’t attempt to find algorithms for general AI but keep making compute cheaper and cheaper and building more of it. You can also think of humans putting insecure GPUs just everywhere—for VR, for analyzing camera footage, for killer drones—as like putting red barrels full of diesel everywhere. Then once AGI exists, there is the compute to iterate thousands of training sessions and immediately find ASI and then all the network stacks protecting the GPUs are all human written. Before humans can react, the ASI exploits implementation bugs and copies itself all over, and recall all those killer drones...
  
  Fast Takeoff is a specific chain reaction similar to an explosion or an avalanche. It’s dangerous because of the impulse function : you go from a world with just software to one with ASI essentially overnight, with ASI infesting everything. An explosion is bad not because of the air it moves but because the shockwave hits all at once.
  
  Sam Altman proposal looks to me like slow takeoff with early investment to speed it up.
  
  So for example, one of the AI labs develops AGI in 5 years, immediately there will be a real 7T investment. AGI turns sand into revenue, any rational market will invest all the capital it has into maximizing that revenue.
  
  Assuming it takes 5 years from today to start producing new AI accelerators with capital invested now, then Sam is anticipating having AGI in 5 years. So it’s a pre-investment into a slow takeoff. It’s not an inpulse, just faster ramp. (It may still be dangerous, a fast enough ramp is indistinguishable from an impulse...)
  
  Whether or not investors can be convinced is an open question. I would assume there will be lingering skepticism that AGI is 5 years away (2 years according to metaculus).
  - Steven Byrnes 12 Feb 2024 23:40 UTC
    8 points
    0
    Parent
    I’m confused about whether you agree or disagree with the proposition: “By doing this chip thing, Sam Altman is likely to make takeoff faster (on the margin) than it otherwise would be.”
    In other words, compare the current world to the world where Sam Altman did everything else in life the same, including leading OpenAI etc., but where he didn’t pursue this big chip project. Which world do you think has faster takeoff? If you already answered that question, then I didn’t understand it, sorry.
    What links here?
    Steven Byrnes's comment on Sam Altman’s Chip Ambitions Undercut OpenAI’s Safety Strategy by garrison (13 Feb 2024 13:31 UTC; 9 points)
    - Gerald Monroe 12 Feb 2024 23:52 UTC
      −6 points
      −8
      Parent
      “By doing this chip thing, Sam Altman is likely to make takeoff faster (on the margin) than it otherwise would be.”
      It makes takeoff faster, but is not necessary a fast takeoff. A fast takeoff is a specific science fiction scenario that happens many times faster than human beings can respond.
      Let’s look at it with numbers:
      Suppose you actually need 80 H100s per AGI. Suppose 2 generations of Moore’s law happen before AGI releases, so the number drops to 20.
      Suppose they are 25k per card. so $500,000 per “person-equivalent”, though that person works ²⁴⁄₇ instead of at most 996, and can load multiple models.
      If annual spending on chip production in 5 years is 1000 billion instead of 528 billion now, and 90%! goes to solely AI chips, that’s 1.8 million “person equivalents” added to the workforce each year. (I am pretending all the other chips are free)
      If we assume at least 30% is reinvested into AI training more efficient models, and also correct for the duty cycle, that’s 2.94 million people added to the workforce per year. (Exponential gains are also slow, if you need 2-3 years per doubling thats not helping much especially since many of the instances aren’t being reinvested)
      I think this is what Sam means when he said “It will change the world much less than we all think and it will change jobs much less than we all think”.
      Update:
      Suppose instead the 7T investment happens. How much does this speed up production?
      Let’s assume we get 30 percent ROI on 7T. 7T has paid for more chip making equipment, more silicon production plants, better automation of these (near future low hanging fruit only!)
      And let’s assume we are paying cost for the chips, and also assume the net cost is double. So it’s $3000 (current estimates) to build a GPU, and $3000 of support ICs to use it.
      So 2.1 trillion USD each year in ICs, and every $120k is another AGI instance.
      That’s like adding another 572 million people per year.
      Hmm. Yeah that’s kind of a singularity,. ICs stops being a limiting factor and then obviously you are adding another China to the Earth every 2 years. Especially if you factor in exponential growth. Or how you functionally have 245 million test dummies/automated AI researchers each year...
      Update: well it’s kind of a fast takeoff actually.
      That’s not a fast takeoff, that’s a slow takeoff, and it’s not an impulse.
      A fast takeoff would be a scenario where say, enough GPUs to host 1 billion extra humans (or the cognitive equivalent) was already “out in the world”, and also most of the hardware was already in clusters that can host big AI models, and it was secured poorly and monitored poorly. Some of the original AI timeline calculations were for when $1000 of compute would equal a human brain. Such a world would be at risk for a fast takeoff, where every random computer in a gas station can host an AGI.
      - Logan Zoellner 13 Feb 2024 13:21 UTC
        6 points
        4
        Parent
        I do actually think $7T is enough that it would materially accelerate Moore’s law, since “production gets more efficient over time” style laws tend to be be functions of “quantity produced” not of time.
        
        In a world where we’re currently spending ~$600B/year on semiconductors, spending a few billion (current largest AI training runs) is insignificant, but if Sam really does manage to spend $7T/5 years, that would be basically tripling our semiconductor capacity.
        
        There might also be negative feedback loops, because when you try to spend a large amount of money quickly you tend to do so less efficiently, so I doubt Moore’s law would literally triple. But if you thought (as Kurtzweil predicts) AGI will arrive circa 2035 based on Moore’s law alone, an investment of this (frankly ridiculous) scale reducing that time from 10 years down to 5 is conceivable.
        [ ]
        [deleted]
- Carl Feynman 13 Feb 2024 15:00 UTC
  2 points
  −2
  Parent
  Mr Byrnes is contrasting fast to slow takeoff, keeping the singularity date constant. Mr Zoellner is keeping the past constant, and contrasting fast takeoff (singularity soon) with slow takeoff (singularity later).