A learned agent is not the same as a learning agent

[Edit: after reading the comments and thinking more about in-context learning, I don’t endorse most of what’s written here. I explain why I’m the end of the post.]

I notice a common confusion when people talk about deep learning. Before I try to describe it in general, let’s start with an example.

Like everyone, I lately had many conversations with friends about ChatGPT. A friend of mine said that while ChatGPT is indeed impressive, it highlights how amazing is the ability of human children to learn language from much less data. While I strongly share the sentiment, I think that the comparison is wrong. As Chomsky hypothesized, children’s ability to learn language from experience is so amazing, that we should doubt that this is really what happens. The child’s hearing language is probably merely fine-tuning to the shallow difference between human languages – building on a very strong prior knowledge about language general shape. The prior knowledge is mostly based on evolution’s “experience” with your great-great aunt who misunderstood instructions and ate poisonous mushrooms as a result – and many similar circumstances. Experience that was in turn aggregated to that of many humanoid great-great uncles who died in fights for dominance after mistaking their opponent’s threats for a bluff. And on the birth and death of earlier creatures, with brains that are something between genius controllers and very stupid agents, exploiting simple patterns in their environments[1].

What was my friend’s source of confusion? Probably that he did not properly distinguish the algorithm that generated ChatGPT as its output from the algorithm that is ChatGPT itself. If you want to compare ChatGPT itself to a human, you should compare its training loop to evolution, not to a human. Did training ChatGPT require more data than human evolution? I’m honestly not sure. But if I want to evaluate ChatGPT’s own learning abilities rather than those of the training loop, I should focus on how it uses information from earlier in a long conversation, as “earlier in the conversation” is it’s equivalent to “earlier in life”. Not information from his training data – which is equivalent to the unfortunate death of my greate-greate aunt. Even with this interpretation of ChatGPT “learning”, it probably doesn’t compare that well with humans – but for different reasons that I’ll mention bellow.

Instead of using the confusing analogy that “artificial neural networks are somewhat similar to neural networks in the brain,” it would be more accurate to use the following analogy:

Human (brain)

Trained model

DNA

Weights of the network

Developmental processes translating DNA to a grown human.

Network architecture (?)

Evolution

Training loop

Evolutionary pressure

(The gradient of?) the loss function

Human knowledge

Vector representation of past interactions since the model’s deployment in the environment. In the case of ChatGPT – since the beginning of the conversation.

To make sure that the analogy is fruitfull, let me finish with some shorter points that derive from it:

  • ChatGPT probably didn’t have that much direct “evolutionary pressure” to be a good learner – so it’s not that surprising if it isn’t.

  • It doesn’t really know that Lincoln was a US president in the same way that we do. It knows about Lincoln in the same sense that a child knows to pull their hand away from a hot oven or to avoid vegetables unless they have overwhelming evidence that they’re safe In the sense that we know about Lincoln, ChatGPT knows only about what you tell him personally.

  • The architectures that we use to translate weight values into a prediction function are much simpler than the mechanisms that biology uses to decode DNA. This may contribute to the fact that biology is arguably able to do more with just 3 billion DNA bases than GPT can with 175 billion weights. It’s worth exploring further.

  • Many ideas about training-loops – like Q-learning with SGD – are inspired by human cognition rather than evolution. This is fine and currently seem more effective than the evolutionary-inspired alternatives. But if we want to create agents that can learn in deployment, we should consider how to make the agents themselves update their internal representations accordingly – not the just training loop.

Edit: what did I miss that may change the conclusion?

  • The idea that we don’t have enough data in our environment and need much more from evolution was a cached thought. It still feel true, but I don’t have compelling reasons to trust that intuition.

  • When I’ve written as an afterthought about the huge amounts of non-linguistic information that is useful for humans’ language acquisition, I should have noticed that it is a very good objection to everything else I wrote—and would have noticed, if not for being motivated to finish writing it.

  • I vastly oversimplified the human learning mechanisms and how they come to be. We have something like reinforcement learning /​local optimization, and different sorts of implicit and explicit memory, and maybe some more complicated built-in inference mechanisms. And along our lives we learn other learning skills, both explicit and implicit. Those skills may be added on top of the inherent learning mechanisms, or may interact with them in many complicated ways that are not analogous to anything in modern ML to the best of my knowledge.

  • The strength of the evolutionary prior and the complexity of those priors are two distinct axes, that would manifest differently in both genome length and human learning abilities.

  • I update on the direction of taking less seriously the distinction between in-context vs out-of-context vs meta learning, and thinking about them as features of our solutions rather than of the problems.

  • I still think that comparing ML stuff to our only examples of general intelligence is useful, but mostly in order to understand the dissimilarity and the parameters that may vary. Finding analogies and comparing performance is less useful.


[1]And on much non-linguistic information about the world that the language intends to describe, and that ChatGPT have no access to – but this a story for another day.