Rub the stars out of your eyes for a second. GPT-3 is a huge leap forward, but it still has some massive structural deficiencies. From most to least important:
It doesn’t care whether it says correct things, only whether it completes its prompts in a realistic way
It can’t choose to spend extra computation on more difficult prompts
It has no memory outside of its current prompt
It can’t take advantage of external resources (like using a text file to organize its thoughts, or using a calculator for arithmetic)
It can’t think unless it’s processing a prompt
It doesn’t know that it’s a machine learning model
“But these can be solved with a layer of prompt engineering!” Give me a break. That’s obviously a brittle solution that does not address the underlying issues.
You can give your pet theories as to how these limitations can be repaired, but it’s not worth much until someone actually writes the code. Before then, we can’t know how difficult it will be or how many years it will take.
1. I predict this passage in particular will age very poorly. Let’s come back to it in 2025.
2. You say “We can’t know how difficult it will be or how many years it will take” Well, why do you seem so confident that it’ll take multiple decades? Shouldn’t you be more epistemically humble / cautious? ;)
You say “We can’t know how difficult it will be or how many years it will take” Well, why do you seem so confident that it’ll take multiple decades? Shouldn’t you be more epistemically humble / cautious? ;)
Epistemic humility means having a wide probability distribution, which I do. The center of the distribution (hundreds of years out in my case) is unrelated to its humility.
Also, the way I phrased that is a little misleading because I don’t think years will be the most appropriate unit of time. I should have said “years/decades/centuries.”
Insofar as your distribution has a faraway median, that means you have close to certainty that it isn’t happening soon. And that, I submit, is ridiculously overconfident and epistemically unhumble.
Your argument seems to prove too much. Couldn’t you say the same thing about pretty much any not-yet-here technology, not just AGI? Like, idk, self-driving cars or more efficient solar panels or photorealistic image generation or DALL-E for 5-minute videos. Yet it would be supremely stupid to have hundred-year medians for each of these things.
[Edited to delete some distracting sentences I accidentally left in]
Insofar as your distribution has a faraway median, that means you have close to certainty that it isn’t happening soon.
And insofar as your distribution has a close median, you have high confidence that it’s not coming later. Any point about humility cuts both ways.
Your argument seems to prove too much. Couldn’t you say the same thing about pretty much any not-yet-here technology, not just AGI? Like, idk, self-driving cars or more efficient solar panels or photorealistic image generation or DALL-E for 5-minute videos. Yet it would be supremely stupid to have hundred-year medians for each of these things.
The difference between those technologies and AGI is that AGI is not remotely well-captured by any existing computer program. With image generation and self-driving, we already have decent results, and there are obvious steps for improvement (e.g. scaling, tweaking architectures). 5-minute videos are similar enough to images that the techniques can be reasonably expected to carry over. Where is the toddler-level, cat-level, or even bee-level proto-agi?
[Replying to this whole thread, not just your particular comment]
“Epistemic humility” over distributions of times is pretty weird to think about, and imo generally confusing or unhelpful. There’s an infinite amount of time, so there is no uniform measure. Nor, afaik, is there any convergent scale-free prior. You must use your knowledge of the world to get any distribution at all.
You can still claim that higher-entropy distributions are more “humble” w.r.t. to some improper prior. Which begs the question “Higher entropy w.r.t. what measure? Uniform? Log-uniform?”. There’s an infinite class of scale-free measures you can use here. The natural way to pick one is using knowledge about the world.
Even in this (possibly absurd) framework, it seems like “high-entropy” doesn’t deserve the word “humble”—since having any reasonable distribution means you already deviated by infinite bits from any scale-free prior, am I significantly less humble for deviating by infinity+1 bits? It’s not like either of us actually started from an improper prior, then collected infinite bits one-by-one, and you can say “hey, where’d you get one extra bit from?”
You can salvage some kind of humility idea here by first establishing, with only the simplest object-level arguments, some finite prior, then being suspicious of longer arguments which drag you far from that prior. Although this mostly looks like regular-old object-level argument. The term “humility” seems often counterproductive, unless ppl already understand which exact form is being invoked.
There’s a different kind of “humility” which is defering to other people’s opinions. This has the associated problem of picking who to defer to. I’m often in favor, whereas Yudkowsky seems generally against, especially when he’s the person being asked to defer (see for example his takedown of “Humbali” here).
I’m often in favor, whereas Yudkowsky seems generally against, especially when he’s the person being asked to defer (see for example his takedown of “Humbali” here).
This is well explained by the hypothesis that he is epistemically superior to all of us (or at least thinks he is).
“Insofar as your distribution has a faraway median, that means you have close to certainty that it isn’t happening soon. And that, I submit, is ridiculously overconfident and epistemically unhumble.”
Why? You can say a similar thing about any median anyone ever has. Why is this median in particular overconfident?
Because it’s pretty obvious that there’s at least some chance of AGI etc. happening soon. Many important lines of evidence support this:
--Many renowned world experts in AI and AGI forecasting say so, possibly even most —Just look at ChatGPT4 --Read the Bio Anchors report —Learn more about AI, deep learning, etc. and in particular about scaling laws and the lottery ticket hypothesis etc. and then get up to speed with everything OpenAI and other labs are doing, and then imagine what sorts of things could be built in the next few years using bigger models with more compute and data etc....
--Note the scarcity of any decent object-level argument that it won’t happen soon. Bio Anchors has the best arguments that it won’t happen this decade, IMO. If you know of any better one I’d be interested to be linked to it or have it explained to me!
1. I predict this passage in particular will age very poorly. Let’s come back to it in 2025.
2. You say “We can’t know how difficult it will be or how many years it will take” Well, why do you seem so confident that it’ll take multiple decades? Shouldn’t you be more epistemically humble / cautious? ;)
Epistemic humility means having a wide probability distribution, which I do. The center of the distribution (hundreds of years out in my case) is unrelated to its humility.
Also, the way I phrased that is a little misleading because I don’t think years will be the most appropriate unit of time. I should have said “years/decades/centuries.”
Insofar as your distribution has a faraway median, that means you have close to certainty that it isn’t happening soon. And that, I submit, is ridiculously overconfident and epistemically unhumble.
Your argument seems to prove too much. Couldn’t you say the same thing about pretty much any not-yet-here technology, not just AGI? Like, idk, self-driving cars or more efficient solar panels or photorealistic image generation or DALL-E for 5-minute videos. Yet it would be supremely stupid to have hundred-year medians for each of these things.
[Edited to delete some distracting sentences I accidentally left in]
And insofar as your distribution has a close median, you have high confidence that it’s not coming later. Any point about humility cuts both ways.
The difference between those technologies and AGI is that AGI is not remotely well-captured by any existing computer program. With image generation and self-driving, we already have decent results, and there are obvious steps for improvement (e.g. scaling, tweaking architectures). 5-minute videos are similar enough to images that the techniques can be reasonably expected to carry over. Where is the toddler-level, cat-level, or even bee-level proto-agi?
[Replying to this whole thread, not just your particular comment]
“Epistemic humility” over distributions of times is pretty weird to think about, and imo generally confusing or unhelpful. There’s an infinite amount of time, so there is no uniform measure. Nor, afaik, is there any convergent scale-free prior. You must use your knowledge of the world to get any distribution at all.
You can still claim that higher-entropy distributions are more “humble” w.r.t. to some improper prior. Which begs the question “Higher entropy w.r.t. what measure? Uniform? Log-uniform?”. There’s an infinite class of scale-free measures you can use here. The natural way to pick one is using knowledge about the world.
Even in this (possibly absurd) framework, it seems like “high-entropy” doesn’t deserve the word “humble”—since having any reasonable distribution means you already deviated by infinite bits from any scale-free prior, am I significantly less humble for deviating by infinity+1 bits? It’s not like either of us actually started from an improper prior, then collected infinite bits one-by-one, and you can say “hey, where’d you get one extra bit from?”
You can salvage some kind of humility idea here by first establishing, with only the simplest object-level arguments, some finite prior, then being suspicious of longer arguments which drag you far from that prior. Although this mostly looks like regular-old object-level argument. The term “humility” seems often counterproductive, unless ppl already understand which exact form is being invoked.
There’s a different kind of “humility” which is defering to other people’s opinions. This has the associated problem of picking who to defer to. I’m often in favor, whereas Yudkowsky seems generally against, especially when he’s the person being asked to defer (see for example his takedown of “Humbali” here).
This is well explained by the hypothesis that he is epistemically superior to all of us (or at least thinks he is).
“Insofar as your distribution has a faraway median, that means you have close to certainty that it isn’t happening soon. And that, I submit, is ridiculously overconfident and epistemically unhumble.”
Why? You can say a similar thing about any median anyone ever has. Why is this median in particular overconfident?
Because it’s pretty obvious that there’s at least some chance of AGI etc. happening soon. Many important lines of evidence support this:
--Many renowned world experts in AI and AGI forecasting say so, possibly even most
—Just look at ChatGPT4
--Read the Bio Anchors report
—Learn more about AI, deep learning, etc. and in particular about scaling laws and the lottery ticket hypothesis etc. and then get up to speed with everything OpenAI and other labs are doing, and then imagine what sorts of things could be built in the next few years using bigger models with more compute and data etc....
--Note the scarcity of any decent object-level argument that it won’t happen soon. Bio Anchors has the best arguments that it won’t happen this decade, IMO. If you know of any better one I’d be interested to be linked to it or have it explained to me!
Ah, so your complaint is that the author is ignoring evidence pointing to shorter timelines. I understand your position better now :)
Strong +1.
See also Eliezer’s comments on his Biological Anchors post for an expanded version of Daniel’s point (search “entropy”).