LLMs showed how far one can get with a good next token prediction engine and a huge amount of training data. This ought to tell us a fair bit about how the human mind works. Specifically, since the next token prediction appears to be so pervasive in human communication, it is bound to shroud other vital parts of it, if any. I wonder what would be left if we managed to factor the prediction engine + training data out of our models of human cognition? Presumably something related to “interpretability” or “qualia” or… something else. What might it be?
[Question] If you factor out next token prediction, what are the remaining salient features of human cognition?
To clarify a bit, here is one of many examples that shows token prediction resulting in self-contradictory replies, apparently because the meaning of its response is not something ChatGPT can model well:
Q: What country is south of Rwanda?
A: Rwanda is a landlocked country located in central Africa. It is bordered by Uganda to the north, Tanzania to the east, Burundi to the south, and the Democratic Republic of the Congo to the west. There are no countries south of Rwanda.
((The question is from the paper https://arxiv.org/pdf/2212.03551.pdf, mentioned by Sarah Constantin in https://twitter.com/s_r_constantin/status/1603809422278418433 .)
I dunno how to respond to this question. It reads like it wants to make a lot of weird-to-me assumptions about the relationship between human cognition and currently popular ML methods. Like, I could give an object-level answer but that feels inadequate considering.
hmm, which part is weird?
I think what struck me most was the assumption the success of the next token prediction objective in today’s ML implies something specific about how human cognition works (especially to the point where we might hypothesize that most of it is just generic prediction + training data).
If you look at the kinds of cognitive architectures that shoot for a brain-like structure, like ACT-R or Leabra or Yann LeCun’s thing or SPAUN, most of the components are doing things that are not very similar to GPT-style next token prediction.
Hmm, interesting. I wonder if this is an example of Carcinisation, where you can get some ways toward imitating/implementing cognition from multiple directions.
Here’s my take:
Like the reward signal in reinforcement learning, next-token prediction is a simple feedback signal that masks a lot of complexity behind the scenes. To predict the next token requires the model first of all to estimate what sort of persona should be speaking, what they know, how they speak, what is the context, and what are they trying to communicate. Self-attention with multiple attention heads at every layer in the Transformer allows the LLM to keep track of all these things. It’s probably not the best way to do it, but it works.
Human brains, and cortex in particular, gives us a powerful way to map all of this sort of information. We can map out our current mental model and predict a map of our interlocutor’s, looking for gaps in each and planning words either to fill in our own gaps (e.g., by asking questions) or to fill in theirs (with what we actually think or with what we want them to think, depending on our goals). I would also say that natural language is actually a sort of programming language, allowing humans to share cognitive programs between minds, programs of behavior or world modelling.
I also asked your question to ChatGPT, and here is what it had to say:
Not sure if I am even looking in the right direction, but in addition to token predictors, humans also have:
instincts;
feedback from reality.
Who knows, maybe without these two we wouldn’t really be better than GPT-3.
To implement feedback from reality, we would need some virtual environment, and special commands to “do” things in that environment. Doing things could have consequences; it could help or hurt the chatbot.
Chatbot could also “see” the environment, which could be implemented as getting special tokens in the prompt? They could also observe each other’s actions (maybe only when they are close to each other).
On top of that, implement instincts, something like very simple personalized extra training data. Something like, to simulate getting burned by fire, add 1000000 strings “observe:fire OUCH” to the training data, thus increasing the probability the chatbot will output OUCH after observing “fire”.
Now some genetic algorithm on top of this, so chatbots can live in their environment, have their instincts somehow modified (randomly? intentionally?), the successful chatbots reproduce, the unsuccessful ones die.
Problems with this approach:
It it works, we could create tons of suffering.
If it works, we might not understand the chatbots anyway. Like, they might evolve their own special language. Or the instincts may be just as illegible as looking at values in a neural network.
EDIT: Also, memory. If I understand it correctly, chatbots only remember a certain length of previous discussion. That is fair, human memory is not unlimited either. But perhaps the chatbot could have multiple communication channels: at least, one to talk to humans, and one to talk to itself. The chatbot could write things to its private channel to remember them after they scrolled off the remembered region of its public channel.