faul_sname comments on LLM cognition is probably not human-like

faul_sname May 8, 2023, 5:42 AM
5 points
2

Suppose for concreteness, on a specific problem (e.g. Python interpreter transcript prediction), GPT-3 makes mistakes that look like humans-making-snap-judgement mistakes, and then GPT-4 gets the answer right all the time. Or, suppose GPT-5 starts playing chess like a non-drunk grandmaster.

Would that result imply that the kind of cognition performed by GPT-3 is fundamentally, qualitatively different from that performed by GPT-4? Similarly for GPT-4 → GPT-5.

In the case of the Python interpreter transcript prediction task, I think if GPT-4 gets the answer right all the time that would indeed imply that GPT-4 is doing something qualitatively different than GPT-3. I don’t think it’s actually possible to get anywhere near 100% accuracy on that task without either having access to, or being, a Python interpreter.

Likewise, in the chess example, I expect that if GPT-5 is better at chess than GPT-4, that will look like “an inattentive and drunk super-grandmaster, with absolutely incredible intuition about the relative strength of board-states, but difficulty with stuff like combinations (but possibly with the ability to steer the game-state away from the board states it has trouble with, if it knows it has trouble in those sorts of situations)”. If it makes the sorts of moves that human grandmasters play when they are playing deliberately, and the resulting play is about as strong as those grandmasters, I think that would show a qualitatively new capability.

Also, my model isn’t “GPT’s cognition is human-like”. It is “GPT is doing the same sort of thing humans do when they make intuitive snap judgements”. In many cases it is doing that thing far far better than any human can. If GPT-5 comes out, and it can natively do tasks like debugging a new complex system by developing and using a gears-level model of that system, I think that would falsify my model.

Also also it’s important to remember that “GPT-5 won’t be able to do that sort of thing natively” does not mean “and therefore there is no way for it to do that sort of thing, given that it has access to tools”. One obvious way for GPT-4 to succeed at the “predict the output of running Python code” is to give it the ability to execute Python code and read the output. The system of “GPT-4 + Python interpreter” does indeed perform a fundamentally, qualitatively different type of cognition that “GPT-4 alone”. But “it requires a fundamentally different type of cognition” does not actually mean “the task is not achievable by known means”.

Also also also.,I mostly care about this model because it suggests interesting things to do on the mechanistic interpretability front. Which I am currently in the process of learning how to do. My personal suspicion is that the bags of tensors are not actually inscrutable, and that looking at these kinds of mistakes would make some of the failure modes of transformers no-longer-mysterious.