Intuitively, I assume that LLMs trained on human data are unlikely to become much smarter than humans, right? Without some additional huge breakthrough, other than just being a language model?
For the sake of intuition, it’s useful to separate the capabilities visibly present in generated sequences from the capabilities of the model itself.
Suppose you’ve got an untuned language model trained on a bunch of human conversations, and you generate a billion rollouts of conversations from scratch (that is, no initial prompting or conditions on the input). This process won’t tend to output conversations between humans that have IQs of 400, because the training distribution does not contain those. The average simulated conversation will be, in many ways, close to the average conversation in the training set.
But it would be incorrect to say that the language model has an “IQ” of 100 (even assuming the humans in the training distribution averaged 100). The capability elicited from the language model depends on the conditions of its predictions. When prompted to produce a conversation between two mathematicians trying to puzzle something out, the result is going be very different from the random sampling case.
You can come up with a decent guess about how smart a character the model plays is, because strong language models tend to be pretty consistent. In contrast, it’s very hard to know how smart a language model is, because its externally visible behavior is only ever a lower bound on its capability. The language model is not its characters; it is the thing that can play any of its characters.
Next, keep in mind that even simple autoregressive token prediction can be arbitrarily hard. A common example is reversing a hash. Consider prompting a language model with: ”0xDF810AF8 is the truncated SHA256 hash of the string”
It does not take superhuman intelligence to write that prompt, but if a language model were able to complete that prompt correctly, it would imply really weird things.
That’s an extreme case, but it’s not unique. For a closer example, try an experiment:
Try writing a program, at least 25 lines of nontrivial code, starting with a blank file, without using any pen and paper or other supporting tools, without editing anything you write, without using backspace. Just start typing characters in sequence and stop when the program’s done. And of course, the program should compile and work correctly.
Then try asking ChatGPT4 to do it. See who gets it done faster, and how many tries it takes!
The choice of which token to output next for this kind of programming task depends on a deep mental model of what comes next, and every character typed constrains your options in the future. Some mistakes are instantly fatal and stop your program from compiling. Others may be fixable by modifying future predictions to compensate, but every deviation adds complexity and constraints. GPT4 frequently sprints straight through all of it.
The key is that even GPT-3 is already superhuman at the thing it’s actually doing. It’s the thing that’s shaping output distributions for input conditions, not the thing most directly “having a conversation” or whatever else.
The assumption goes that after ingesting human data, it can remix it (like humans do for art, for example) and create its own synthetic data it can then train on. The go-to example is AlphaGo, which after playing a ton of simulated games against itself, became great at Go. I am not qualified enough to give my informed opinion or predictions, but that’s what I know.
Intuitively, I assume that LLMs trained on human data are unlikely to become much smarter than humans, right? Without some additional huge breakthrough, other than just being a language model?
For the sake of intuition, it’s useful to separate the capabilities visibly present in generated sequences from the capabilities of the model itself.
Suppose you’ve got an untuned language model trained on a bunch of human conversations, and you generate a billion rollouts of conversations from scratch (that is, no initial prompting or conditions on the input). This process won’t tend to output conversations between humans that have IQs of 400, because the training distribution does not contain those. The average simulated conversation will be, in many ways, close to the average conversation in the training set.
But it would be incorrect to say that the language model has an “IQ” of 100 (even assuming the humans in the training distribution averaged 100). The capability elicited from the language model depends on the conditions of its predictions. When prompted to produce a conversation between two mathematicians trying to puzzle something out, the result is going be very different from the random sampling case.
You can come up with a decent guess about how smart a character the model plays is, because strong language models tend to be pretty consistent. In contrast, it’s very hard to know how smart a language model is, because its externally visible behavior is only ever a lower bound on its capability. The language model is not its characters; it is the thing that can play any of its characters.
Next, keep in mind that even simple autoregressive token prediction can be arbitrarily hard. A common example is reversing a hash. Consider prompting a language model with:
”0xDF810AF8 is the truncated SHA256 hash of the string”
It does not take superhuman intelligence to write that prompt, but if a language model were able to complete that prompt correctly, it would imply really weird things.
That’s an extreme case, but it’s not unique. For a closer example, try an experiment:
Try writing a program, at least 25 lines of nontrivial code, starting with a blank file, without using any pen and paper or other supporting tools, without editing anything you write, without using backspace. Just start typing characters in sequence and stop when the program’s done. And of course, the program should compile and work correctly.
Then try asking ChatGPT4 to do it. See who gets it done faster, and how many tries it takes!
The choice of which token to output next for this kind of programming task depends on a deep mental model of what comes next, and every character typed constrains your options in the future. Some mistakes are instantly fatal and stop your program from compiling. Others may be fixable by modifying future predictions to compensate, but every deviation adds complexity and constraints. GPT4 frequently sprints straight through all of it.
The key is that even GPT-3 is already superhuman at the thing it’s actually doing. It’s the thing that’s shaping output distributions for input conditions, not the thing most directly “having a conversation” or whatever else.
The assumption goes that after ingesting human data, it can remix it (like humans do for art, for example) and create its own synthetic data it can then train on. The go-to example is AlphaGo, which after playing a ton of simulated games against itself, became great at Go. I am not qualified enough to give my informed opinion or predictions, but that’s what I know.