Daniel Kokotajlo comments on Reviews of “Is power-seeking AI an existential risk?”

Daniel Kokotajlo 17 Dec 2021 1:28 UTC
9 points
Thanks a ton for the thoughtful and detailed engagement! Below follows my reply to your reply to my spiel about timelines in my review:
I am sad that “Fun with +12 OOMs of compute” came across as pumping the intuition “wow a trillion is a lot,” because I don’t have that intuition at all. I wrote the post to illustrate what +12 OOMs of compute might look like in practice, precisely because a trillion doesn’t feel like a lot to me; it doesn’t feel like anything, it’s just a number.
I didn’t devote much (any?) time in the post to actually arguing that OmegaStar et al would be transformative. I just left it as an intuition pump. But the actual arguments do exist, and they are roughly of the form “Look at the empirical scaling trends we see today; now imagine scaling these systems up to OmegaStar, Amp(GPT-7), etc.; we can say various things about what these scaled-up systems would be capable of, simply by drawing straight lines on graphs, and already it’s looking pretty impressive/scary/”fun,” and then moreover qualitatively there are novel phenomena like “transfer learning” and “generalization” and “few-shot-learning” which have been kicking in when we scale stuff up (see: GPT series) and so we have reason to think that these phenomena will intensify, and be joined by additional novel phenomena, as we scale up… all of which should lead to a broader and more impressive suite of capabilities than we’d get from trend extrapolation on existing metrics alone.”

I do think I haven’t spelled out these arguments anywhere (even to my own satisfaction) and it’s important for me to do so; maybe I am overrating them. I’ll put it on my todo list.
I think my methodology would perform fine if we applied it in the past, because we wouldn’t have scaling trends to extrapolate. Besides, we wouldn’t know about transfer learning, generalization, etc. and a whole host of other things. If I sent a description of Amp(GPT-7) back to 2010 and asked leading AI experts to comment on what it would be capable of, they would say that it probably wouldn’t even be able to speak grammatically correct English. They might even be confused about why I was going on and on about how big it was, since they wouldn’t know about the scaling properties of language models in particular or indeed of neural nets in general.
Moreover, in the past there wasn’t a huge scale up of compute expenditure happening. (Well, not anytime in the past except the last 10 years.) That’s a separate reason why this methodology would not have given short timelines in the past.
I can imagine in 2015 this methodology starting to give short timelines—because by then it was clear that people were starting to make bigger and bigger models and (if one was super savvy) one might have figured out that making them bigger was making them better. But definitely prior to 2010, the situation would have been: Compute for AI training is proceeding with Moore’s Law, i.e. very slowly; even if (somehow) we were convinced that there was a 80% chance that +12 OOMs from 2010-levels would be enough for TAI (again, I think we wouldn’t have been, without the scaling laws and whatnot) we would have expected to cross maybe 1 or 2 of those OOMs per decade, not 5 or 6!
Oh yeah and then there’s the human-brain-size thing. That also is a reason to think this methodology would not have led us astray in the past, because it is a way in which the 2020′s are a priori unusually likely to contain exciting AI developments. (Technically it’s not the 2020′s we are talking about here, it’s the range between around 10^23 and 10^26 FLOPs, and/or the range between around 10^14 and 10^16 artificial synapses)
You say 25% on “scaling up and fine-tuning GPT-ish systems works and brain-based anchors give a decent ballpark for model sizes.” What do you mean by this? Do you not have upwards of 75% credence that the GPT scaling trends will continue for the next four OOMs at least? If you don’t, that is indeed a big double crux.
I agree Neuromorph and Skunkworks are less plausible and do poorly on the “why couldn’t you have said this in previous eras” test. I think they contribute some extra probability mass to the total but not much.
On re-running evolution: I wonder how many chimpanzees we’d need to kill in a breeding program to artificially evolve another intelligent species like us. If the answer is “Less than a trillion” then that suggests that CrystalNights would work, provided we start from something about as smart as a chimp. And arguably OmegaStar would be about as smart as a chimp—it would very likely appear much smarter to people talking with it, at least. I’m not sure how big a deal this is, because intelligence isn’t a single dimension, but it seems interesting and worth mentioning.
Re: “I deny the implication of your premise 2: namely, that if you have 80%+ on +12 OOMs, you should have 40%+ on +6 – this seems to imply that your distribution should be log-uniform from 1-12, which is a lot stronger of a claim than just “don’t be super spikey around 12.”” Here, let me draw some graphs. I don’t know much math, but I know a too-spikey distribution when I see one! Behold the least spikey distribution I could make, subject to the constraint that 80% be by +12. Notice how steep the reverse slope is, and reflect on what that slope entails about the credence-holder’s confidence in their ability to distinguish between two ultimately very similar and outlandish hypothetical scenarios.
Oh wait, now I see your final credences… you say 25% in 1e29 or less? And thus 20% by 2030? OK, then we have relatively little to argue about actually! 20% is not much different from my 50% for decision-making purposes. :)
- Joe Carlsmith 17 Dec 2021 6:14 UTC
  5 points
  Parent
  Thanks for these comments.
  that suggests that CrystalNights would work, provided we start from something about as smart as a chimp. And arguably OmegaStar would be about as smart as a chimp—it would very likely appear much smarter to people talking with it, at least.
  “starting with something as smart as a chimp” seems to me like where a huge amount of the work is being done, and if Omega-star --> Chimp-level intelligence, it seems a lot less likely we’d need to resort to re-running evolution-type stuff. I also don’t think “likely to appear smarter than a chimp to people talking with it” is a good test, given that e.g. GPT-3 (2?) would plausibly pass, and chimps can’t talk.
  “Do you not have upwards of 75% credence that the GPT scaling trends will continue for the next four OOMs at least? If you don’t, that is indeed a big double crux.”—Would want to talk about the trends in question (and the OOMs—I assume you mean training FLOP OOMs, rather than params?). I do think various benchmarks are looking good, but consider e.g. the recent Gopher paper:
  On the other hand, we find that scale has a reduced benefit for tasks in the Maths, Logical Reasoning, and Common Sense categories. Smaller models often perform better across these categories than larger models. In the cases that they don’t, larger models often don’t result in a performance increase. Our results suggest that for certain flavours of mathematical or logical reasoning tasks, it is unlikely that scale alone will lead to performance breakthroughs. In some cases Gopher has a lower performance than smaller models– examples of which include Abstract Algebra and Temporal Sequences from BIG-bench, and High School Mathematics from MMLU.
  (Though in this particular case, re: math and logical reasoning, there are also other relevant results to consider, e.g. this and this.)
  It seems like “how likely is it that continuation of GPT scaling trends on X-benchmarks would result in APS-systems” is probably a more important crux, though?
  Re: your premise 2, I had (wrongly, and too quickly) read this as claiming “if you have X% on +12 OOMs, you should have at least 1/2*X% on +6 OOMs,” and log-uniformity was what jumped to mind as what might justify that claim. I have a clearer sense of what you were getting at now, and I accept something in the vicinity if you say 80% on +12 OOMs (will edit accordingly). My +12 number is lower, though, which makes it easier to have a flatter distribution that puts more than half of the +12 OOM credence above +6.
  The difference between 20% and 50% on APS-AI by 2030 seems like it could well be decision-relevant to me (and important, too, if you think that risk is a lot higher in short-timelines worlds).
  - Daniel Kokotajlo 17 Dec 2021 18:18 UTC
    5 points
    Parent
    Nice! This has been a productive exchange; it seems we agree on the following things:
    --We both agree that probably the GPT scaling trends will continue, at least for the next few OOMs; the main disagreement is about what the practical implications of this will be—sure, we’ll have human-level text prediction and superhuman multiple-choice-test-takers, but will we have APS-AI? Etc.
    --I agree with what you said about chimps and GPT-3 etc. GPT-3 is more impressive than a chimp in some ways, and less in others, and just because we could easily get from chimp to AGI doesn’t mean we can easily get from GPT-3 to AGI. (And OmegaStar may be relevantly similar to GPT-3 in this regard, for all we know.) My point was a weak one which I think you’d agree with: Generally speaking, the more ways in which system X seems smarter than a chimp, the more plausible it should seem that we can easily get from X to AGI, since we believe we could easily get from a chimp to AGI.
    --Now we are on the same page about Premise 2 and the graphs. Sorry it was so confusing. I totally agree, if instead of 80% you only have 55% by +12 OOMs, then you are free to have relatively little probability mass by +6. And you do.
    - Joe Carlsmith 17 Dec 2021 19:05 UTC
      4 points
      Parent
      (Note that my numbers re: short-horizon systems + 12 OOMs being enough, and for +12 OOMs in general, changed since an earlier version you read, to 35% and 65% respectively.)
      - Daniel Kokotajlo 17 Dec 2021 19:53 UTC
        4 points
        Parent
        Ok, cool! Here, is this what your distribution looks like basically?
        Joe’s Distribution?? - Grid Paint (grid-paint.com)
        I built it by taking Ajeya’s distribution from her report and modifying it so that:
        --25% is in the red zone (the next 6 ooms)
        --65% is in the red+blue zone (the next 12)
        --It looks as smooth and reasonable as I could make it subject to those constraints, and generally departs only a little from Ajeya’s.
        Note that it still has 10% in the purple zone representing “Not even +50 OOMs would be enough with 2020′s ideas”
        
        I encourage you (and everyone else!) to play around with drawing distributions, I found it helpful. You should be able to make a copy of my drawing in Grid Paint and then modify it.