that suggests that CrystalNights would work, provided we start from something about as smart as a chimp. And arguably OmegaStar would be about as smart as a chimp—it would very likely appear much smarter to people talking with it, at least.
“starting with something as smart as a chimp” seems to me like where a huge amount of the work is being done, and if Omega-star --> Chimp-level intelligence, it seems a lot less likely we’d need to resort to re-running evolution-type stuff. I also don’t think “likely to appear smarter than a chimp to people talking with it” is a good test, given that e.g. GPT-3 (2?) would plausibly pass, and chimps can’t talk.
“Do you not have upwards of 75% credence that the GPT scaling trends will continue for the next four OOMs at least? If you don’t, that is indeed a big double crux.”—Would want to talk about the trends in question (and the OOMs—I assume you mean training FLOP OOMs, rather than params?). I do think various benchmarks are looking good, but consider e.g. the recent Gopher paper:
On the other hand, we find that scale has a reduced benefit for tasks in the Maths, Logical Reasoning, and Common Sense categories. Smaller models often perform better across these categories than larger models. In the cases that they don’t, larger models often don’t result in a performance increase. Our results suggest that for certain flavours of mathematical or logical reasoning tasks, it is unlikely that scale alone will lead to performance breakthroughs. In some cases Gopher has a lower performance than smaller models– examples of which include Abstract Algebra and Temporal Sequences from BIG-bench, and High School Mathematics from MMLU.
(Though in this particular case, re: math and logical reasoning, there are also other relevant results to consider, e.g. this and this.)
It seems like “how likely is it that continuation of GPT scaling trends on X-benchmarks would result in APS-systems” is probably a more important crux, though?
Re: your premise 2, I had (wrongly, and too quickly) read this as claiming “if you have X% on +12 OOMs, you should have at least 1/2*X% on +6 OOMs,” and log-uniformity was what jumped to mind as what might justify that claim. I have a clearer sense of what you were getting at now, and I accept something in the vicinity if you say 80% on +12 OOMs (will edit accordingly). My +12 number is lower, though, which makes it easier to have a flatter distribution that puts more than half of the +12 OOM credence above +6.
The difference between 20% and 50% on APS-AI by 2030 seems like it could well be decision-relevant to me (and important, too, if you think that risk is a lot higher in short-timelines worlds).
Nice! This has been a productive exchange; it seems we agree on the following things:
--We both agree that probably the GPT scaling trends will continue, at least for the next few OOMs; the main disagreement is about what the practical implications of this will be—sure, we’ll have human-level text prediction and superhuman multiple-choice-test-takers, but will we have APS-AI? Etc.
--I agree with what you said about chimps and GPT-3 etc. GPT-3 is more impressive than a chimp in some ways, and less in others, and just because we could easily get from chimp to AGI doesn’t mean we can easily get from GPT-3 to AGI. (And OmegaStar may be relevantly similar to GPT-3 in this regard, for all we know.) My point was a weak one which I think you’d agree with: Generally speaking, the more ways in which system X seems smarter than a chimp, the more plausible it should seem that we can easily get from X to AGI, since we believe we could easily get from a chimp to AGI.
--Now we are on the same page about Premise 2 and the graphs. Sorry it was so confusing. I totally agree, if instead of 80% you only have 55% by +12 OOMs, then you are free to have relatively little probability mass by +6. And you do.
(Note that my numbers re: short-horizon systems + 12 OOMs being enough, and for +12 OOMs in general, changed since an earlier version you read, to 35% and 65% respectively.)
I built it by taking Ajeya’s distribution from her report and modifying it so that: --25% is in the red zone (the next 6 ooms) --65% is in the red+blue zone (the next 12) --It looks as smooth and reasonable as I could make it subject to those constraints, and generally departs only a little from Ajeya’s. Note that it still has 10% in the purple zone representing “Not even +50 OOMs would be enough with 2020′s ideas”
I encourage you (and everyone else!) to play around with drawing distributions, I found it helpful. You should be able to make a copy of my drawing in Grid Paint and then modify it.
Thanks for these comments.
“starting with something as smart as a chimp” seems to me like where a huge amount of the work is being done, and if Omega-star --> Chimp-level intelligence, it seems a lot less likely we’d need to resort to re-running evolution-type stuff. I also don’t think “likely to appear smarter than a chimp to people talking with it” is a good test, given that e.g. GPT-3 (2?) would plausibly pass, and chimps can’t talk.
“Do you not have upwards of 75% credence that the GPT scaling trends will continue for the next four OOMs at least? If you don’t, that is indeed a big double crux.”—Would want to talk about the trends in question (and the OOMs—I assume you mean training FLOP OOMs, rather than params?). I do think various benchmarks are looking good, but consider e.g. the recent Gopher paper:
(Though in this particular case, re: math and logical reasoning, there are also other relevant results to consider, e.g. this and this.)
It seems like “how likely is it that continuation of GPT scaling trends on X-benchmarks would result in APS-systems” is probably a more important crux, though?
Re: your premise 2, I had (wrongly, and too quickly) read this as claiming “if you have X% on +12 OOMs, you should have at least 1/2*X% on +6 OOMs,” and log-uniformity was what jumped to mind as what might justify that claim. I have a clearer sense of what you were getting at now, and I accept something in the vicinity if you say 80% on +12 OOMs (will edit accordingly). My +12 number is lower, though, which makes it easier to have a flatter distribution that puts more than half of the +12 OOM credence above +6.
The difference between 20% and 50% on APS-AI by 2030 seems like it could well be decision-relevant to me (and important, too, if you think that risk is a lot higher in short-timelines worlds).
Nice! This has been a productive exchange; it seems we agree on the following things:
--We both agree that probably the GPT scaling trends will continue, at least for the next few OOMs; the main disagreement is about what the practical implications of this will be—sure, we’ll have human-level text prediction and superhuman multiple-choice-test-takers, but will we have APS-AI? Etc.
--I agree with what you said about chimps and GPT-3 etc. GPT-3 is more impressive than a chimp in some ways, and less in others, and just because we could easily get from chimp to AGI doesn’t mean we can easily get from GPT-3 to AGI. (And OmegaStar may be relevantly similar to GPT-3 in this regard, for all we know.) My point was a weak one which I think you’d agree with: Generally speaking, the more ways in which system X seems smarter than a chimp, the more plausible it should seem that we can easily get from X to AGI, since we believe we could easily get from a chimp to AGI.
--Now we are on the same page about Premise 2 and the graphs. Sorry it was so confusing. I totally agree, if instead of 80% you only have 55% by +12 OOMs, then you are free to have relatively little probability mass by +6. And you do.
(Note that my numbers re: short-horizon systems + 12 OOMs being enough, and for +12 OOMs in general, changed since an earlier version you read, to 35% and 65% respectively.)
Ok, cool! Here, is this what your distribution looks like basically?
Joe’s Distribution?? - Grid Paint (grid-paint.com)
I built it by taking Ajeya’s distribution from her report and modifying it so that:
--25% is in the red zone (the next 6 ooms)
--65% is in the red+blue zone (the next 12)
--It looks as smooth and reasonable as I could make it subject to those constraints, and generally departs only a little from Ajeya’s.
Note that it still has 10% in the purple zone representing “Not even +50 OOMs would be enough with 2020′s ideas”
I encourage you (and everyone else!) to play around with drawing distributions, I found it helpful. You should be able to make a copy of my drawing in Grid Paint and then modify it.