To simplify Daniel’s point: the pretraining paradigm claims that language draws heavily on important domains like logic, causal reasoning, world knowledge, etc; to reach human absolute performance (as measured in prediction: perplexity/cross-entropy/bpc), a language model must learn all of those domains roughly as well as humans do; GPT-3 obviously has not learned those important domains to a human level; therefore, if GPT-3 had the same absolute performance as humans but not the same important domains, the pretraining paradigm must be false because we’ve created a language model which succeeds at one but not the other. There may be a way to do pretraining right, but one turns out to not necessarily follow from the other and so you can’t just optimize for absolute performance and expect the rest of it to fall into place.
(It would have turned out that language models can model easier or inessential parts of human corpuses enough to make up for skipping the important domains; maybe if you memorize enough quotes or tropes or sayings, for example, you can predict really well while still failing completely at commonsense reasoning, and this would hold true no matter how much more data was added to the pile.)
As it happens, GPT-3 has not reached the same absolute performance because we’re just comparing apples & oranges. I was only talking about WebText in my comment there, but Omohundro is talking about Penn Tree Bank & 1BW. As far as I can tell, GPT-3 is still substantially short of human performance.
If I thought about it more, it might shorten them idk. But my idea was: I’m worried that the GPTs are on a path towards human-level AGI. I’m worried that predicting internet text is an “HLAGI-complete problem” in the sense that in order to do it as well as a human you have to be a human or a human-level AGI. This is worrying because if the scaling trends continue GPT-4 or 5 or 6 will probably be able to do it as well as a human, and thus be HLAGI.
If GPT-3 is already superhuman, well, that pretty much falsifies the hypothesis that predicting internet text is HLAGI-complete. It makes it more likely that actually the GPTs are not fully general after all, and that even GPT-6 and GPT-7 will have massive blind spots, be super incompetent at various important things, etc.
Agreed. Superhuman levels will unlikely be achieved simultaneously in different domain even for universal system. For example, some model could be universal and superhuman in math, but not superhuman in say emotion readings. Bad for alignment.
Why it lengthens your timelines?
To simplify Daniel’s point: the pretraining paradigm claims that language draws heavily on important domains like logic, causal reasoning, world knowledge, etc; to reach human absolute performance (as measured in prediction: perplexity/cross-entropy/bpc), a language model must learn all of those domains roughly as well as humans do; GPT-3 obviously has not learned those important domains to a human level; therefore, if GPT-3 had the same absolute performance as humans but not the same important domains, the pretraining paradigm must be false because we’ve created a language model which succeeds at one but not the other. There may be a way to do pretraining right, but one turns out to not necessarily follow from the other and so you can’t just optimize for absolute performance and expect the rest of it to fall into place.
(It would have turned out that language models can model easier or inessential parts of human corpuses enough to make up for skipping the important domains; maybe if you memorize enough quotes or tropes or sayings, for example, you can predict really well while still failing completely at commonsense reasoning, and this would hold true no matter how much more data was added to the pile.)
As it happens, GPT-3 has not reached the same absolute performance because we’re just comparing apples & oranges. I was only talking about WebText in my comment there, but Omohundro is talking about Penn Tree Bank & 1BW. As far as I can tell, GPT-3 is still substantially short of human performance.
If I thought about it more, it might shorten them idk. But my idea was: I’m worried that the GPTs are on a path towards human-level AGI. I’m worried that predicting internet text is an “HLAGI-complete problem” in the sense that in order to do it as well as a human you have to be a human or a human-level AGI. This is worrying because if the scaling trends continue GPT-4 or 5 or 6 will probably be able to do it as well as a human, and thus be HLAGI.
If GPT-3 is already superhuman, well, that pretty much falsifies the hypothesis that predicting internet text is HLAGI-complete. It makes it more likely that actually the GPTs are not fully general after all, and that even GPT-6 and GPT-7 will have massive blind spots, be super incompetent at various important things, etc.
Agreed. Superhuman levels will unlikely be achieved simultaneously in different domain even for universal system. For example, some model could be universal and superhuman in math, but not superhuman in say emotion readings. Bad for alignment.