It seems to me that imitation requires some form of prediction in order to work. First make some prediction of the behavioral trajectory of another agent; then try to minimize the deviation of your own behavior from an equivalent trajectory. In this scheme, prediction constitutes a strict subset of the computational complexity necessary to enable imitation. How would GPT’s task flip this around?
And if prediction is what’s going on, in the much-more-powerful-than-imitation sense, what sort of training scheme would be necessary to produce pure imitation without also training the more powerful predictor as a prerequisite?
“Imitation of itself” constitutes prediction, although this phrase doesn’t make much sense. Humans’ “linguistic center” or “skill” predicts their own generated text with some veracity, but usually (unless the person is a professional linguist, or a translator, or a very talented writer) are bad at predicting others’ generated text (i.e., styles).
So, one vector of superhumanness is that GPT is trained to predict extremely wide range of styles, of tens of thousands of notable writers and speakers across the training corpus.
Another vector of superhumanness is that GPTs are trained to produce this prediction autoregressively, “on the first try”, whereas for people it may take many iterations to craft good writing, speech, or, perhaps most importantly, code. Then, since GPTs can match this skill “intuitively”, in a single rollout, when GPTs are themselves applied iteratively, e.g. to iteratively critique and improve their own generation, this could produce a superhuman quality of code, or strategic planning, or rhetoric, etc.
It seems to me that imitation requires some form of prediction in order to work. First make some prediction of the behavioral trajectory of another agent; then try to minimize the deviation of your own behavior from an equivalent trajectory. In this scheme, prediction constitutes a strict subset of the computational complexity necessary to enable imitation. How would GPT’s task flip this around?
And if prediction is what’s going on, in the much-more-powerful-than-imitation sense, what sort of training scheme would be necessary to produce pure imitation without also training the more powerful predictor as a prerequisite?
“Imitation of itself” constitutes prediction, although this phrase doesn’t make much sense. Humans’ “linguistic center” or “skill” predicts their own generated text with some veracity, but usually (unless the person is a professional linguist, or a translator, or a very talented writer) are bad at predicting others’ generated text (i.e., styles).
So, one vector of superhumanness is that GPT is trained to predict extremely wide range of styles, of tens of thousands of notable writers and speakers across the training corpus.
Another vector of superhumanness is that GPTs are trained to produce this prediction autoregressively, “on the first try”, whereas for people it may take many iterations to craft good writing, speech, or, perhaps most importantly, code. Then, since GPTs can match this skill “intuitively”, in a single rollout, when GPTs are themselves applied iteratively, e.g. to iteratively critique and improve their own generation, this could produce a superhuman quality of code, or strategic planning, or rhetoric, etc.