jacob_cannell comments on How did LW update p(doom) after LLMs blew up?

jacob_cannell 23 Apr 2023 18:31 UTC
2 points
0
The kind of efficiency that matters most is something like “Performance on various world-takeover and R&D tasks, as a function of total $, compute, etc. initially controlled.” Here are the kinds of efficiency you talk about in that post

Efficiency in terms of intelligence/$ is obviously downstream dependent on the various lower level metrics I cited.

Funnily enough, I think these assumptions are approximately correct* & yet I think once we get human-level AGI, we’ll be weeks rather than years from superintelligence.

I may somewhat agree, depending on how we define SI. However the current transformer GPU paradigm seems destined for a slowish takeoff. GPT4 used perhaps 1e25 flops and produced only a proto-AGI (which ironically is far more general than any one human, but still missing critical action/planning skills/experience), and it isn’t really feasible to continue that scaling to 1e27 flops and beyond any time soon.

If you agree with me on this, then it seems a bit unfair to dunk on EY so much, even if he was wrong about various kinds of brain efficiency.

I don’t think its unfair at all. EY’s unjustified claims are accepted at face value by too many people here, but in reality his sloppy analysis results in a poor predictive track record. The AI/ML folks who dismiss the LW doom worldview as crankish are justified in doing so if this is the best argument for doom.

But quantitatively if it still takes only a few weeks to reach superintelligence—by which I mean AGI which is significantly more competent than the best-ever humans at task X, for all relevant intellectual tasks—then the bottom line conclusions Yudkowsky drew appear to be correct, no?

I’m not sure what the “only a few weeks” measures, but I’ll assume you are referring to the duration of a training run. For various reasons I believe this will tend to be a few months or more for the most competitive models at least for the foreseeable future, not a few weeks.

We already have proto-AGI in the form of GPT4 which is already more competent than the average human at most white-collar, non-robotic tasks. Further increase in generality is probably non-useful, most of the further value will come from improving agentic performance to increasingly out-compete the most productive/skilled humans in valuable skill niches—ie going for more skill depth rather than width. This may require increasing specialization and larger parameter counts—for example if creating the world’s best lisp programmer requires 1T params just by itself, that will result in pretty slow takeoff from here.

I also suspect it may also be possible to soon have a (very expensive) speed-intelligence that is roughly human-level ability but thinks 100x or 1000x faster, but that isn’t the kind of FOOM EY predicted. That’s a scenario I predicted and hanson and others to varying degrees: human shaped minds running at high speeds. Those will necessarily be brain-like AGI, as the brain is simply what intelligence optimized for efficiency and low latency especially looks like. Digital minds can use their much higher clock rate to run minimal depth brain-like circuits at high speeds or much deeper circuits at low speeds, but the evidence is now pretty overwhelmingly favoring the benefits of the former over the latter—as I predicted well in advance.

I’d be curious though to hear your thoughts on Gwern’s fictional takeover story, which I think is unrealistic in a bunch of ways but am curious to hear whether it’s violating any of the efficiency limits

The central premise of the story is that an evolutionary search auto-ML process running on future TPUs and using less than 5e24 flops (around the training cost of GPT4) results suddenly in a SI, in a future world that seems to completely lack even human-level AGI. No I don’t think that’s realistic at all, because the brain is efficient and just human-level AGI requires around that much. SI requires far more. The main caveat of course—as I already mentioned—is that once you train a human-level AGI you could throw a bunch of compute at it to run it faster, but doing that doesn’t actually increase the net human-power of the resulting mind (vs spending the same compute on N agent instances in parallel).

If we interpret it broadly as no significant further room for capabilities-per-dollar, or capabilities-per-flop, then I don’t accept that assumption, and claim that you haven’t done nearly enough to establish it,

Essentially all recent progress comes from hardware, not software. Some people here like to cite a few works trying to measure software-progress, but those conclusions/analysis are mostly all wrong (a tangent for another thread). The difference between GPT1 scaling to GPT4 is almost entirely due to throwing more money on hardware combined with hardware advances from nvidia/TSMC. OpenAI’s open secret of success is simply that they were the first to test the scaling hypothesis—which more than anything else—is a validation of my predictive model^[1].

The hardware advances are about to peter out, and scaling up the spend on supercomputer training by another 100x from ~$1B ~~is not really an option~~ seems unlikely anytime soon due to poor scaling of supercomputers of that size and various attendant risks^[2].

and instead seem to be making a similar mistake to a hypothetical bird enthusiast in 1900 who declared that planes would never outcompete pigeons because it wasn’t possible to be substantially more efficient than birds.

I never said AGI wouldn’t outcompete humans, on the contrary my model very much has been AGI or early SI by end of this decade and strong singularity before 2050. But the brain is actually efficient, and it just takes alot of compute to reverse engineer the brain, and moore’s law is ending. Moravec’s model was mostly correct, but also hanson’s (because hanson’s model is very much an AGI requires virtual brains model, and he’s carefully thought out much of the resulting economics).
1. ↩︎
  I should point out that for anthropic reasons we should obviously never expect to witness the endpoint of EY’s doom model, but that model still makes some tentatively different intermediate predictions which have mostly all been falsified.
2. ↩︎
  A $100B one month training run would require about 50 million high-end GPUs (which cost about $2,000 a month each), and tens of gigawatts of power. Nvidia ships well less than a million flagship GPUs per year.
- habryka 23 Apr 2023 18:57 UTC
  4 points
  0
  Parent
  Not commenting on this whole thread, which I do have a lot of takes about that I am still processing, but a quick comment on this line:
  The hardware advances are about to peter out, and scaling up the spend on supercomputer training by another 100x from ~$1B is not really an option.
  I don’t see any reason for why we wouldn’t see a $100B training run within the next few years. $100B is not that much (it’s roughly a third of Google’s annual revenue, so if they really see competition in this domain as an existential threat, they alone might be able to fund a training run like this).
  It might have to involve some collaboration of multiple tech companies, or some government involvement, but I currently expect that if scaling continues to work, we are going to see a $100B training run (though like, this stuff is super hard to forecast, so I am more like 60% on this, and also wouldn’t be surprised if it didn’t happen).
  - jacob_cannell 23 Apr 2023 20:05 UTC
    6 points
    0
    Parent
    In retrospect I actually somewhat agree with you so I edited that line and denoted with a strike-through. Yes a $100B training run is an option in theory, but it is unlikely to translate to a 100x increase in training compute due to datacenter scaling difficulties, and this is also greater than OpenAI’s estimated market cap. (I also added a note with a quick fermi estimate showing that a training run of that size would require massively increasing nvidia’s GPU output by at least an OOM) For various reasons I expect even those with pockets that deep to instead invest more in a number of GPT4 size runs exploring alternate training paths.