Gerald Monroe comments on Processor clock speeds are not how fast AIs think

Gerald Monroe 30 Jan 2024 19:48 UTC
6 points
−7
Daniel you model near future intelligence explosions. The simple reason these probably cannot happen are that when you have a multiple stage process, the slow step always wins. For an explosion there are 4 ingredients, not 1 : (algorithm, compute, data, robotics). Robotics is necessary or the explosion halts once available human labor is utilized.

Summary : if you make AI smarter with recursion, you will be limited by (silicon, data, or robotics) and the process cannot run faster than the slowest step.

(1) you have the maximum throughput and serial speed achievable with your hardware. If a human brain is (86b * 1000 * 1000) / 10 arbitrarily sparse 8-bit flops.

Please notice the keyword “arbitrarily sparse”. That means on GPUs in several years, whenever Nvidia gets around to supporting this. Otherwise you need a lot more compute. Notice the dividing by 10, I am assuming GPUs are 10 times better than meatware. (Less noisy, less neurons failing to fire)

But just ignoring the sparsity (and VRAM and utilization) issues for some numbers, 2 million H100s are projected to ship in 2024, so if you had full GPU utilization that’s 43 cards per “human equivalent” at inference time.

What this means bottom line is the “ecosystem” of compute can support a finite amount human equivalents, or 46,511 humans if you use all hardware.

If you run the hardware faster, you lose throughput. If we take your estimate as correct (note you will need custom hardware past a certain point, GPUs will no longer work) then that’s like adding 74 new humans but they think 125 times faster, and can do the work of 9302 humans but serially fast and ²⁴⁄₇.

You probably should estimate and plot how much new AI silicon production can be added with each year after 2024.

Assuming a dominant AI lab buys up 15 percent of worldwide production, like Meta says they will do this year. That’s your roofline. Also remember most of the hardware is going it be service customers and not being used for R&D.

So if 50 percent of the hardware is serving other projects or customers, and we have 15 percent of worldwide production, then we now have 697 new humans in throughput per hour, over 5 serial threads, though you would obviously assign more than 5 tasks, and context switch between them.

Probably 2025 has 50 percent more AI accelerators built than 2024 and so on, so I suggest you add this factor to your world modeling. This is extremely meaningful to timelines.

(2) once serial speed is no longer a bottleneck any recursive improvement process bottlenecks on the evaluation system. For a simple example, once you can achieve the highest average score on a suite of tests that is possible, no further improvement will happen. Past a certain level of intelligence you would assume the AI system will just bottleneck on human written evals, learning at the fastest rate that doesn’t overfit. Yes you could go to AI written evals but how do humans tell if the eval is optimizing for a useful metric?

(3) once an AI system is maxed, bottlenecked on computer or data, it may be many times smarter than humans but limited by real world robotics or data. I gave a mock example of an engine design task in the comments here, and the speedup is 50 times, not 1 million times, because the real world has steps limited by physics. This is relevant for modeling an “explosion” and why once AGI is achieved in a few years it probably won’t immediately change everything as there aren’t enough robots.
- Daniel Kokotajlo 30 Jan 2024 20:10 UTC
  7 points
  2
  Parent
  I am well aware of Amdahl’s law. By intelligence explosion I mean the period where progress in AI R&D capabilities is at least 10x faster than it is now; it need not be infinitely faster. I agree it’ll be bottlenecked on the slowest component, but according to my best estimates and guesses the overall speed will still be >10x what it is today.
  
  I also agree that there aren’t enough robots; I expect a period of possibly several years between ASI and the surface of the earth being covered in robots.
  - Gerald Monroe 30 Jan 2024 22:40 UTC
    10 points
    −1
    Parent
    
    intelligence explosion I mean the period where progress in AI R&D capabilities is at least 10x faster than it is now; it need not be infinitely faster
    
    Well a common view is that AI now is only possible with the compute levels available recently. This would mean it’s already compute limited.
    
    Say for the sake of engagement that GPTs are inefficient by 10 times vs the above estimate, that “AI progress” is leading to more capabilities on equal compute, and other progress is leading to more capabilities with more compute.
    
    So in this scenario needs 8 H100s per robot. You want to “cover the earth”. 8 billion people don’t cover the earth, let’s assume that 800 billion robots will.
    
    Well how fast will progress happen? Let’s assume we started with a baseline of 2 million H100s in 2024, and each year after we add 50 percent more production rate to the year prior, and every 2.5 years we double the performance per unit, then in 94 years we will have built enough H100s to cover the earth with robots.
    
    How long do you think it requires for a factory to build the components used in itself, including the workers? That would be another way to calculate this. If it’s 2 years, and we started with 250k capable robots (because of H100 shortage) then it would take 43 years for 800 billion. 22 years if it drops to 1 year.
    
    People do bring up biological doubling times but neglect the surrounding ecosystem. When we say “double a robot” we don’t mean theres a table of robot parts and one robot assembles another. That would take a few hours max. It means building all parts, mining for materials, and also doubling all the machinery used to make everything. Double the factories, double the mines, double the power generators.
    
    You can use China’s rate of industrial expansion as a proxy for this. At 15 percent thats a doubling time of 5 years. So if robots double in 2 years they are more than than twice as efficient as China, and remember China got some 1 time bonuses from a large existing population and natural resources like large untapped rivers.
    
    Cellular self replication is using surrounding materials conveniently dissolved in solution, lichen for instance add only a few mm per year.
    
    “AI progress” is then time bound to the production of compute and robots. You make AI progress, once you collect some “1 time” low hanging fruit bonus, at a rate equal to the log(available compute). This is compute you can spare not tied up in robots.
    
    What log base do you assume? What’s the log base for compute vs capabilities measurer at now? Is 1-2 years doubling time reasonable or do you have some data that would support a faster rate of growth? What do you think the “low hanging fruit” bonus is? I assumed 10x above because smaller models seem to currently have to throw away key capabilities to Goodheart benchmarks.
    
    Numbers and the quality of the justification really really matter here. The above says that we shouldn’t be worried about foom, but of course good estimates for the parameters would convince me otherwise.
    - Daniel Kokotajlo 2 Feb 2024 23:26 UTC
      4 points
      2
      Parent
      The above does not say that we shouldn’t be worried about foom. Foom happens inside a lab with a fixed amount of compute; then, the ASIs in the lab direct the creation of more fabs, factories, wetlabs, etc. until the economy is transformed. Indeed the second stage of this process might take several years. I expect it to go at maybe about twice the speed of China initially but to accelerate rather than slow down.
      - Gerald Monroe 3 Feb 2024 0:38 UTC
        4 points
        0
        Parent
        I expect it to go at maybe about twice the speed of China initially
        Ok this sounds like a fair estimate. so 30% annual growth rate, or 2.4 years. I estimated 2 years, which would be faster growth. One concrete reason to think that this is a fair estimate is that AI controlled robots will work 24 hours a day. China’s brutal 996 work schedule, which is 72 hours a week, is still only 43% duty cycle—a robot can work at at least a 95% duty cycle. (other 5% is swapping parts as parts of the robot fail)
        So trying to engage with the ‘foom’ idea, you develop in the lab some ‘ASI’. That ASI is ‘directing’ these things? Why? Why do humans trust the outputs?
        I was thinking the ‘foom’ idea was ok, the ASI robs lies and steals and cons. It finds some way to scam humans into doing work for it. And ok, every 2.4 years it’s resources are doubling. (there might be some early ‘low hanging fruit’, for example people have suggested an ASI might be able to extract money from financial markets with better trades. This will saturate)
        Meanwhile, other humans use the same technology that ASI unlocks, and make controlled ASI—models you prompt without context or memory, weaker or distilled ASI models that are specialized. Then they simultaneously, under human direction, invest hundreds of billions of dollars into robot factories and double the equipment available to humans. Say the humans are less efficient, well, how much less efficient? Is there doubling time 3 years? 5?
        There is a set of numbers where the ASI win, but so long as the initial starting point for the humans is a large enough resource multiplier, and the humans eventually notice the actions of the rebel ASI, it’s going to usually be human victory. For example:
        ASI steals 1 billion dollars and covertly starts doubling. Doubling time is 2 years, 10 years later, they have 32 billion in resources.
        Humans do some funding rounds and get 100 billion dollars and legally start doubling. Doubling time is 4 years. 10 years later, they have 400 billion in resources.
        Humans spot the unregistered factories belonging to the ASI via IR or serendipity. The first round of AI wars starts...
        ASI needs a pretty large utility modifier to win with such a resource disparity, especially as humans can keep feeding the same context for each battle to the ASI models the humans have, stripping off the context bits that let the ASI know when it’s being Ender’s Gamed*, and getting solutions to the tactical scenarios.
        I was understanding that ‘foom’ meant somewhat more science fiction takeoff speeds, such as doubling times of a week. That would be an issue if humans do not also have the ability to order their resources doubled on a weekly basis.
        *c’mon give me credit for this turn of phrase