Eliezer… points out that in order to predict all the next word in all the text on the internet and all similar text, you need to be able to model the processes that are generating that text
I wanted to add this comment to the original post, but there were already dozens of other comments by the time I got to it and I figured the effort would have been wasted.
EY’s original post is correct in its narrow claim, but wildly misleading in its implications. He’s correct that to reliably predict the next word in a previously-unseen text is superhuman, and requires doing simulation and modeling that would be staggering in its implications. But insofar as that is the goal, how close is GPT to actually doing it? How well does GPT predict the next token in an unknown string in contexts where English syntax gives you many degrees of freedom?
Answer: it’s terrible! Its failure rate approaches 100%! (Again, excluding contexts where syntactic or semantic constraints give you very few degrees of freedom.) It is not even starting to approximate attempting to actually implementing the kinds of simulation and modeling that success would imply. What it can do is produce text that matches the statistical distribution of human text, including non-local correlations (ie. semantics), and to a certain degree the statistical idiosyncracies of specific writers (ie. style), and it turns out that getting even that far is pretty impressive. It’s also pretty impressive that you can treat “predict the next token” as the goal and get this much good out of it while still being bad at actually predicting the next token. But the training data that GPT has is enough to teach it something about syntax and semantics, but is not remotely close to the amount or kind of data that would be necessary to teach it to simulate the universe.
The EY article boils down to “if GPT-Omega were an omniscient god that knew everything you were going to say before you said it, would that be freaky or what”. Yeah, bro, it would be freaky. But that has nothing to do with what GPT can actually do.
This seems like an unusual misreading of Eliezer’s post, which is quite explicitly about the potential bounds of future systems’ performance, and not about the performance of the current system. There is no implication that the current system is superhuman (or even average-human) in the dimensions that you specified.
They sound more like fantasy bounds than ‘potential’ simply because there isn’t 1000x or 10000x more training data in existence for such a future system to train on. (Nor are there any likely pathways for this to occur, other than training on the outputs of prior models)
I understood that. I guess I should have been more explicit about my belief that the amount of training data that would result in training a viable universal simulator would be “all of the text ever created”, and then several orders of magnitude more.
I wanted to add this comment to the original post, but there were already dozens of other comments by the time I got to it and I figured the effort would have been wasted.
EY’s original post is correct in its narrow claim, but wildly misleading in its implications. He’s correct that to reliably predict the next word in a previously-unseen text is superhuman, and requires doing simulation and modeling that would be staggering in its implications. But insofar as that is the goal, how close is GPT to actually doing it? How well does GPT predict the next token in an unknown string in contexts where English syntax gives you many degrees of freedom?
Answer: it’s terrible! Its failure rate approaches 100%! (Again, excluding contexts where syntactic or semantic constraints give you very few degrees of freedom.) It is not even starting to approximate attempting to actually implementing the kinds of simulation and modeling that success would imply. What it can do is produce text that matches the statistical distribution of human text, including non-local correlations (ie. semantics), and to a certain degree the statistical idiosyncracies of specific writers (ie. style), and it turns out that getting even that far is pretty impressive. It’s also pretty impressive that you can treat “predict the next token” as the goal and get this much good out of it while still being bad at actually predicting the next token. But the training data that GPT has is enough to teach it something about syntax and semantics, but is not remotely close to the amount or kind of data that would be necessary to teach it to simulate the universe.
The EY article boils down to “if GPT-Omega were an omniscient god that knew everything you were going to say before you said it, would that be freaky or what”. Yeah, bro, it would be freaky. But that has nothing to do with what GPT can actually do.
This seems like an unusual misreading of Eliezer’s post, which is quite explicitly about the potential bounds of future systems’ performance, and not about the performance of the current system. There is no implication that the current system is superhuman (or even average-human) in the dimensions that you specified.
They sound more like fantasy bounds than ‘potential’ simply because there isn’t 1000x or 10000x more training data in existence for such a future system to train on. (Nor are there any likely pathways for this to occur, other than training on the outputs of prior models)
I understood that. I guess I should have been more explicit about my belief that the amount of training data that would result in training a viable universal simulator would be “all of the text ever created”, and then several orders of magnitude more.