Error
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
I think there’s a lot of cumulated evidence pointing against the view that LLMs are (very) alien and pointing towards their semantics being quite similar to those of humans (though of course not identical). E.g. have a look at papers (comparing brains to LLMs) from the labs of Ev Fedorenko, Uri Hasson, Jean-Remi King, Alex Huth (or twitter thread summaries).
Can you link to some specific papers here? I’ve looked into 1-2 papers of this genre in the last few months, and they seemed very weak to me, but you might have links to better papers, and I would be interested in checking them out.
Thanks for engaging. Can you say more about which papers you’ve looked at / in which ways they seemed very weak? This will help me adjust what papers I’ll send; otherwise, I’m happy to send a long list.
Also, to be clear, I don’t think any specific paper is definitive evidence, I’m mostly swayed by the cumulated evidence from all the work I’ve seen (dozens of papers), with varying methodologies, neuroimaging modalities, etc.
Alas, I can’t find the one or two that I looked at quickly. It came up in a recent Twitter conversation, I think with Quintin?
Can’t speak for Habryka, but I would be interested in just seeing the long list.
Here goes (I’ve probably still missed some papers, but the most important ones are probably all here):
Brains and algorithms partially converge in natural language processing
Shared computational principles for language processing in humans and deep language models
Deep language algorithms predict semantic comprehension from brain activity
The neural architecture of language: Integrative modeling converges on predictive processing (video summary); though maybe also see Predictive Coding or Just Feature Discovery? An Alternative Account of Why Language Models Fit Brain Data
Brain embeddings with shared geometry to artificial contextual embeddings, as a code for representing language in the human brain
Artificial neural network language models align neurally and behaviorally with humans even after a developmentally realistic amount of training
Correspondence between the layered structure of deep language models and temporal structure of natural language processing in the human brain
Reconstructing the cascade of language processing in the brain using the internal computations of a transformer-based language model
Linguistic brain-to-brain coupling in naturalistic conversation
Semantic reconstruction of continuous language from non-invasive brain recordings
Driving and suppressing the human language network using large language models
Lexical semantic content, not syntactic structure, is the main contributor to ANN-brain similarity of fMRI responses in the language network
Training language models for deeper understanding improves brain alignment
Natural language processing models reveal neural dynamics of human conversation
Semantic Representations during Language Comprehension Are Affected by Context
Unpublished—scaling laws for predicting brain data (larger LMs are better), potentially close to noise ceiling (90%) for some brain regions with largest models
Twitter accounts of some of the major labs and researchers involved (especially useful for summaries):
https://twitter.com/HassonLab
https://twitter.com/JeanRemiKing
https://twitter.com/ev_fedorenko
https://twitter.com/alex_ander
https://twitter.com/martin_schrimpf
https://twitter.com/samnastase
https://twitter.com/mtoneva1
These papers are interesting, thanks for compiling them!
Skimming through some of them, the sense I get is that they provide evidence for the claim that the structure and function of LLMs is similar to (and inspired by) the structure of particular components of human brains, namely, the components which do language processing.
This is slightly different from the claim I am making, which is about how the cognition of LLMs compares to the cognition of human brains as a whole. My comparison is slightly unfair, since I’m comparing a single forward pass through an LLM to get a prediction of the next token, to a human tasked with writing down an explicit probability distribution on the next token, given time to think, research, etc. [1]
Also, LLM capability at language processing / text generation is already far superhuman (by some metrics). The architecture of LLMs may be simpler than the comparable parts of the brain’s architecture in some ways, but the LLM version can run with far more precision / scale / speed than a human brain. Whether or not LLMs are already exceeding human brains by specific metrics is debatable / questionable, but they are not bottlenecked on further scaling by biology.
And this is to say nothing of all the other kinds of cognition that happens in the brain. I see these brain components as analogous to LangChain or AutoGPT, if LangChain or AutoGPT themselves were written as ANNs that interfaced “natively” with the transformers of an LLM, instead of as Python code.
Finally, similarity of structure doesn’t imply similarity of function. I elaborated a bit on this in a comment thread here.
You might be able to get better predictions from an LLM by giving it more “time to think”, using chain-of-thought prompting or other methods. But these are methods humans use when using LLMs as a tool, rather than ideas which originate from within the LLM itself, so I don’t think it’s exactly fair to call them “LLM cognition” on their own.
Re the superhuman next prediction ability, there’s an issue in which the evaluations are fairly distorted in ways which make humans artificially worse than they actually are at next-token prediction, see here:
https://www.lesswrong.com/posts/htrZrxduciZ5QaCjw/language-models-seem-to-be-much-better-than-humans-at-next#wPwSND5mfQ7ncruWs
Thanks!
they’re somewhat alien, not highly alien, agreed
Great post!
I don’t think a human would come up with a similar probability distribution. But I think that’s because asking a human for a probability distribution forces them to switch from the “pattern-match similar stuff they’ve seen in the past” strategy to the “build an explicit model (or several)” strategy.
I think the equivalent step is not “ask a single human for a probability distribution over the next token”, but, instead, “ask a large number of humans who have lots of experience with Python and the Python REPL to make a snap judgement of what the next token is”.
BTW rereading my old comment, I see that there are two different ways you can interpret it:
“GPT-n makes similar mistakes to humans that are not paying attention[, and this is because it was trained on human outputs and will thus make similar mistakes to the ones it was trained on. If it were trained on something other than human outputs, like sensor readings, it would not make these sorts of mistakes.]”.
“GPT-n makes similar mistakes to humans that are not paying attention[, and this is because GPT-n and human brains making snap judgements are both doing the same sort of thing. If you took a human and an untrained transformer, and some process which deterministically produced a complex (but not pure noise) data stream, and converted it to an audio stream for the human and a token stream for the transformer, and trained them both on the first bit of it, they would both be surprised by similar bits of the part that they had not been trained on. ].”
I meant something more like the second interpretation. Also “human who is not paying attention” is an important part of my model here. GPT-4 can play mostly-legal chess, but I think that process should be thought of as more like “a blindfolded, slightly inebriated chess grandmaster plays bullet chess” not “a human novice plays the best chess that they can”.
I could very easily be wrong about that! But it does suggest some testable hypotheses, in the form of “find some process for which generates a somewhat predictable sequence, train both a human and a transformer to predict that sequence, and see if they make the same types of errors or completely different types of errors”.
Edit: being more clear that I appreciate the effort that went into this post and think it was a good post
Suppose for concreteness, on a specific problem (e.g. Python interpreter transcript prediction), GPT-3 makes mistakes that look like humans-making-snap-judgement mistakes, and then GPT-4 gets the answer right all the time. Or, suppose GPT-5 starts playing chess like a non-drunk grandmaster.
Would that result imply that the kind of cognition performed by GPT-3 is fundamentally, qualitatively different from that performed by GPT-4? Similarly for GPT-4 → GPT-5.
It seems more likely to me that each model performs some kind of non-human-like cognition at a higher level of performance (though possibly each iteration of the model is qualitatively different from previous versions). And I’m not sure there’s any experiment which involves only interpreting and comparing output errors without investigating the underlying mechanisms which produced them (e.g. through mechanistic interpretability) which would convince me otherwise. But it’s an interesting idea, and I think experiments like this could definitely tell us something.
(Also, thanks for clarifying and expanding on your original comment!)
In the case of the Python interpreter transcript prediction task, I think if GPT-4 gets the answer right all the time that would indeed imply that GPT-4 is doing something qualitatively different than GPT-3. I don’t think it’s actually possible to get anywhere near 100% accuracy on that task without either having access to, or being, a Python interpreter.
Likewise, in the chess example, I expect that if GPT-5 is better at chess than GPT-4, that will look like “an inattentive and drunk super-grandmaster, with absolutely incredible intuition about the relative strength of board-states, but difficulty with stuff like combinations (but possibly with the ability to steer the game-state away from the board states it has trouble with, if it knows it has trouble in those sorts of situations)”. If it makes the sorts of moves that human grandmasters play when they are playing deliberately, and the resulting play is about as strong as those grandmasters, I think that would show a qualitatively new capability.
Also, my model isn’t “GPT’s cognition is human-like”. It is “GPT is doing the same sort of thing humans do when they make intuitive snap judgements”. In many cases it is doing that thing far far better than any human can. If GPT-5 comes out, and it can natively do tasks like debugging a new complex system by developing and using a gears-level model of that system, I think that would falsify my model.
Also also it’s important to remember that “GPT-5 won’t be able to do that sort of thing natively” does not mean “and therefore there is no way for it to do that sort of thing, given that it has access to tools”. One obvious way for GPT-4 to succeed at the “predict the output of running Python code” is to give it the ability to execute Python code and read the output. The system of “GPT-4 + Python interpreter” does indeed perform a fundamentally, qualitatively different type of cognition that “GPT-4 alone”. But “it requires a fundamentally different type of cognition” does not actually mean “the task is not achievable by known means”.
Also also also.,I mostly care about this model because it suggests interesting things to do on the mechanistic interpretability front. Which I am currently in the process of learning how to do. My personal suspicion is that the bags of tensors are not actually inscrutable, and that looking at these kinds of mistakes would make some of the failure modes of transformers no-longer-mysterious.
One of the things I think about a lot, and ask my biologist/anthropologist/philosopher friends, is: what does it take for something to be actually recognised as human-like by humans? For instance, I see human-like cognition and behaviour in most mammals, but this seems to be resisted almost by instinct by my human friends who insist that humans are superior and vastly different. Why don’t we have a large appreciation for anthill architecture, or whale songs, or flamingo mating dances? These things all seem human-like to me, but are not accepted as forms of “fine art” by humans. I hypothesize that we may be collectively underestimating our own species-centrism, and are in grave danger of doing the same with AI, by either under-valuing superior AI as not human enough, or by over-valuing inferior AI with human-like traits that are shortcomings more than assets.
How do we prove that an AI is “human-like”? Should that be our goal? Given that we have a fairly limited knowledge of the mechanics of human/mammalian cognition, and that humans seem to have a widespread assumption that it’s the most superior form of cognition/intelligence/behaviour we (as a species) have seen?
Yes, GPTs would have alien-like cognition.
Whether they can translate is unclear because limits of translation of human languages are still unknown.
Yes, they are trained in logs of human thoughts. Each log entry corresponds to a human thought, eg. there is a bijection. There is thus no formal difference.
Re: predicting encodings of human thought, I’m not sure what is supposed to be compelling about this. GPTs currently would only learn a subset of human cognition, namely, that subset that generates human text. So sure, trained on more types of human cognition might make it more accurately follow more types of human cognition. Therefore...?
Yes, a brain and a Python interpreter do not have a similar internal structure in evaluating Python semantics. So what? This is as interesting as the fact that a mechanical computer is internally different from an electronic computer. What matters is that they both implement basically the same externally observable semantics in interpreting Python.
Suffice it to say that I didn’t find anything here particularly compelling.