Would a human, asked to predict the next token of any of the sequences above, be likely to come up with similar probability distributions for similar reasons? Probably not, though depending on the human, how much they know about Python, and how much effort they put into the making their prediction, the output that results from sampling from the human’s predicted probability distribution might match the output of sampling text-davinci’s distribution, in some cases. But the LLM and the human probably arrive at their probability distributions through vastly different mechanisms.
I don’t think a human would come up with a similar probability distribution. But I think that’s because asking a human for a probability distribution forces them to switch from the “pattern-match similar stuff they’ve seen in the past” strategy to the “build an explicit model (or several)” strategy.
I think the equivalent step is not “ask a single human for a probability distribution over the next token”, but, instead, “ask a large number of humans who have lots of experience with Python and the Python REPL to make a snap judgement of what the next token is”.
BTW rereading my old comment, I see that there are two different ways you can interpret it:
“GPT-n makes similar mistakes to humans that are not paying attention[, and this is because it was trained on human outputs and will thus make similar mistakes to the ones it was trained on. If it were trained on something other than human outputs, like sensor readings, it would not make these sorts of mistakes.]”.
“GPT-n makes similar mistakes to humans that are not paying attention[, and this is because GPT-n and human brains making snap judgements are both doing the same sort of thing. If you took a human and an untrained transformer, and some process which deterministically produced a complex (but not pure noise) data stream, and converted it to an audio stream for the human and a token stream for the transformer, and trained them both on the first bit of it, they would both be surprised by similar bits of the part that they had not been trained on. ].”
I meant something more like the second interpretation. Also “human who is not paying attention” is an important part of my model here. GPT-4 can play mostly-legal chess, but I think that process should be thought of as more like “a blindfolded, slightly inebriated chess grandmaster plays bullet chess” not “a human novice plays the best chess that they can”.
I could very easily be wrong about that! But it does suggest some testable hypotheses, in the form of “find some process for which generates a somewhat predictable sequence, train both a human and a transformer to predict that sequence, and see if they make the same types of errors or completely different types of errors”.
Edit: being more clear that I appreciate the effort that went into this post and think it was a good post
...and this is because GPT-n and human brains making snap judgements are both doing the same sort of thing.
I could very easily be wrong about that! But it does suggest some testable hypotheses, in the form of “find some process for which generates a somewhat predictable sequence, train both a human and a transformer to predict that sequence, and see if they make the same types of errors or completely different types of errors”.
Suppose for concreteness, on a specific problem (e.g. Python interpreter transcript prediction), GPT-3 makes mistakes that look like humans-making-snap-judgement mistakes, and then GPT-4 gets the answer right all the time. Or, suppose GPT-5 starts playing chess like a non-drunk grandmaster.
Would that result imply that the kind of cognition performed by GPT-3 is fundamentally, qualitatively different from that performed by GPT-4? Similarly for GPT-4 → GPT-5.
It seems more likely to me that each model performs some kind of non-human-like cognition at a higher level of performance (though possibly each iteration of the model is qualitatively different from previous versions). And I’m not sure there’s any experiment which involves only interpreting and comparing output errors without investigating the underlying mechanisms which produced them (e.g. through mechanistic interpretability) which would convince me otherwise. But it’s an interesting idea, and I think experiments like this could definitely tell us something.
(Also, thanks for clarifying and expanding on your original comment!)
Suppose for concreteness, on a specific problem (e.g. Python interpreter transcript prediction), GPT-3 makes mistakes that look like humans-making-snap-judgement mistakes, and then GPT-4 gets the answer right all the time. Or, suppose GPT-5 starts playing chess like a non-drunk grandmaster.
Would that result imply that the kind of cognition performed by GPT-3 is fundamentally, qualitatively different from that performed by GPT-4? Similarly for GPT-4 → GPT-5.
In the case of the Python interpreter transcript prediction task, I think if GPT-4 gets the answer right all the time that would indeed imply that GPT-4 is doing something qualitatively different than GPT-3. I don’t think it’s actually possible to get anywhere near 100% accuracy on that task without either having access to, or being, a Python interpreter.
Likewise, in the chess example, I expect that if GPT-5 is better at chess than GPT-4, that will look like “an inattentive and drunk super-grandmaster, with absolutely incredible intuition about the relative strength of board-states, but difficulty with stuff like combinations (but possibly with the ability to steer the game-state away from the board states it has trouble with, if it knows it has trouble in those sorts of situations)”. If it makes the sorts of moves that human grandmasters play when they are playing deliberately, and the resulting play is about as strong as those grandmasters, I think that would show a qualitatively new capability.
Also, my model isn’t “GPT’s cognition is human-like”. It is “GPT is doing the same sort of thing humans do when they make intuitive snap judgements”. In many cases it is doing that thing far far better than any human can. If GPT-5 comes out, and it can natively do tasks like debugging a new complex system by developing and using a gears-level model of that system, I think that would falsify my model.
Also also it’s important to remember that “GPT-5 won’t be able to do that sort of thing natively” does not mean “and therefore there is no way for it to do that sort of thing, given that it has access to tools”. One obvious way for GPT-4 to succeed at the “predict the output of running Python code” is to give it the ability to execute Python code and read the output. The system of “GPT-4 + Python interpreter” does indeed perform a fundamentally, qualitatively different type of cognition that “GPT-4 alone”. But “it requires a fundamentally different type of cognition” does not actually mean “the task is not achievable by known means”.
Also also also.,I mostly care about this model because it suggests interesting things to do on the mechanistic interpretability front. Which I am currently in the process of learning how to do. My personal suspicion is that the bags of tensors are not actually inscrutable, and that looking at these kinds of mistakes would make some of the failure modes of transformers no-longer-mysterious.
Great post!
I don’t think a human would come up with a similar probability distribution. But I think that’s because asking a human for a probability distribution forces them to switch from the “pattern-match similar stuff they’ve seen in the past” strategy to the “build an explicit model (or several)” strategy.
I think the equivalent step is not “ask a single human for a probability distribution over the next token”, but, instead, “ask a large number of humans who have lots of experience with Python and the Python REPL to make a snap judgement of what the next token is”.
BTW rereading my old comment, I see that there are two different ways you can interpret it:
“GPT-n makes similar mistakes to humans that are not paying attention[, and this is because it was trained on human outputs and will thus make similar mistakes to the ones it was trained on. If it were trained on something other than human outputs, like sensor readings, it would not make these sorts of mistakes.]”.
“GPT-n makes similar mistakes to humans that are not paying attention[, and this is because GPT-n and human brains making snap judgements are both doing the same sort of thing. If you took a human and an untrained transformer, and some process which deterministically produced a complex (but not pure noise) data stream, and converted it to an audio stream for the human and a token stream for the transformer, and trained them both on the first bit of it, they would both be surprised by similar bits of the part that they had not been trained on. ].”
I meant something more like the second interpretation. Also “human who is not paying attention” is an important part of my model here. GPT-4 can play mostly-legal chess, but I think that process should be thought of as more like “a blindfolded, slightly inebriated chess grandmaster plays bullet chess” not “a human novice plays the best chess that they can”.
I could very easily be wrong about that! But it does suggest some testable hypotheses, in the form of “find some process for which generates a somewhat predictable sequence, train both a human and a transformer to predict that sequence, and see if they make the same types of errors or completely different types of errors”.
Edit: being more clear that I appreciate the effort that went into this post and think it was a good post
Suppose for concreteness, on a specific problem (e.g. Python interpreter transcript prediction), GPT-3 makes mistakes that look like humans-making-snap-judgement mistakes, and then GPT-4 gets the answer right all the time. Or, suppose GPT-5 starts playing chess like a non-drunk grandmaster.
Would that result imply that the kind of cognition performed by GPT-3 is fundamentally, qualitatively different from that performed by GPT-4? Similarly for GPT-4 → GPT-5.
It seems more likely to me that each model performs some kind of non-human-like cognition at a higher level of performance (though possibly each iteration of the model is qualitatively different from previous versions). And I’m not sure there’s any experiment which involves only interpreting and comparing output errors without investigating the underlying mechanisms which produced them (e.g. through mechanistic interpretability) which would convince me otherwise. But it’s an interesting idea, and I think experiments like this could definitely tell us something.
(Also, thanks for clarifying and expanding on your original comment!)
In the case of the Python interpreter transcript prediction task, I think if GPT-4 gets the answer right all the time that would indeed imply that GPT-4 is doing something qualitatively different than GPT-3. I don’t think it’s actually possible to get anywhere near 100% accuracy on that task without either having access to, or being, a Python interpreter.
Likewise, in the chess example, I expect that if GPT-5 is better at chess than GPT-4, that will look like “an inattentive and drunk super-grandmaster, with absolutely incredible intuition about the relative strength of board-states, but difficulty with stuff like combinations (but possibly with the ability to steer the game-state away from the board states it has trouble with, if it knows it has trouble in those sorts of situations)”. If it makes the sorts of moves that human grandmasters play when they are playing deliberately, and the resulting play is about as strong as those grandmasters, I think that would show a qualitatively new capability.
Also, my model isn’t “GPT’s cognition is human-like”. It is “GPT is doing the same sort of thing humans do when they make intuitive snap judgements”. In many cases it is doing that thing far far better than any human can. If GPT-5 comes out, and it can natively do tasks like debugging a new complex system by developing and using a gears-level model of that system, I think that would falsify my model.
Also also it’s important to remember that “GPT-5 won’t be able to do that sort of thing natively” does not mean “and therefore there is no way for it to do that sort of thing, given that it has access to tools”. One obvious way for GPT-4 to succeed at the “predict the output of running Python code” is to give it the ability to execute Python code and read the output. The system of “GPT-4 + Python interpreter” does indeed perform a fundamentally, qualitatively different type of cognition that “GPT-4 alone”. But “it requires a fundamentally different type of cognition” does not actually mean “the task is not achievable by known means”.
Also also also.,I mostly care about this model because it suggests interesting things to do on the mechanistic interpretability front. Which I am currently in the process of learning how to do. My personal suspicion is that the bags of tensors are not actually inscrutable, and that looking at these kinds of mistakes would make some of the failure modes of transformers no-longer-mysterious.