Predictive processing is the theory that biological neurons minimize free energy. In this context, free energy isn’t physical energy like the energy in your laptop battery. Instead, free energy is an informatic concept. It is useful to think about free energy as a single variable balancing prediction error and model complexity. Minimizing free energy balances minimizing prediction error against minimizing model complexity. You can also think about it as minimizing surprise.
Biological neurons have inputs and outputs. Each neuron receives inputs from one set of neurons and sends outputs to another set of neurons. One way to minimize prediction error is to fire right after receiving inputs, but an even better way to minimize prediction error is for a neuron to anticipate when its input neurons will fire and then fire along with them. Firing in-sync with its inputs produces zero prediction error instead of just a small prediction error.
Suppose you take a 3-dimensional blob of neurons programmed to minimize local prediction error and you attach them to a sinusoidal wave generator. The neural net has no outputs—just this single input. At first the neurons close to the wave generator will lag behind the sinusoidal wave generator. But eventually they’ll and sync up with it. Our neural net will have produced an internal representation of the world.
Now suppose you plug a sinusoidal wave generator into the left end of the neural net and a square wave generator into the right end of the neural net. The left end of the neural net will wave in time with the sinusoidal wave generator and the right end of the neural net will beat in time with the square wave generator. The middle of the net will find some smooth balance between the two. Plug more complicated inputs into our neural net and it will generate a more complicated representation of the world.
Simple Behavior
Consider a neural net with one input and one output. The output of the brain is connected to the input of a static noise generator and the static noise generator’s output is connected to the brain’s input. The brain is a free energy minimizer. Static noise has high free energy. The brain will thus tend to produce outputs that minimize the noise. In other words, the brain will figure out (if it can) how to turn off the static noise generator.
We now have all the ingredients we need to build a simple nervous system. Start with a net of neurons. Attach sensory inputs to the net. The net will create an internal model of the world. Whenever something bad happens (such as the organism suffering damage), blast the net with a short, intense burst of random noise. The organism will learn to avoid whatever causes the short, intense bursts of random noise. We have invented pain.
Cultivating Abstract Concepts
Some features of the environment can be hard-coded. You don’t need to learn how to feel pain or feel warmth. All evolution has to do is invent the relevant sensors and then emit a burst of static (in the case of pain) or a less-noisy signal (in the case of warmth).
Other features cannot be hard-coded at all. Your brain contains no priors about whether whether Obanazawa is in the Yamagata Prefecture. Facts like that are purely learned.
But some features of the environment are partially hard-coded and partially learned.
I’m heterosexual (straight). Being straight is normative in the sense that most people are straight. Heterosexuality makes sense in evolutionary terms too. But being straight is computationally bizarre. If given the chance to kiss a healthy woman my age in a dark room then I would say yes. If given the chance to kiss a healthy man my age in a dark room then I would say no. But I can’t actually tell whether I’m making out with a man or woman (assuming the man has shaved) just from the feeling of her/his lips. Evolution programmed me to prefer the abstract concept of a woman to the abstract concept of a man.
Suppose you want a certain region W of a biological neural net to activate in response to seeing a woman. The concept of a “woman” is too abstract to define exactly. All you have are a bunch of heuristics that are all correlated with observing a woman. For example, you could have one heuristic which fires in response to high-pitched voices, another which fires in response to more complicated inflection, another which fires in response to seeing breasts and so on. Feed the outputs of all these heuristics into the inputs of region W. Loosely couple region W to the rest of your world model. Region W will eventually learn to trigger in response to the abstract concept of a woman. Region W will even draw on other information in the broader world model when deciding whether to fire.
Reinforcement
So far we have discussed how to punish our neural network: just blast it with pain. But we have not yet discussed how to reward our neural network.
Dopamine has something to do with reward, motivation and reinforcement, but not pleasure. My guess is that when dopamine is released it temporarily increases the brain’s learning rate—the amount that connections are updated—by a lot. If this is true then we ought to expect dopamine releases to produce short-term local overfitting, and to cause behavior that is globally maladaptive (increases free energy) in the long run.
There is one potential problem with this theory: It is a long-established fact in behavioral psychology that intermittent reinforcement are more motivating than predictable reinforcement. But predictive processing seeks to minimize surprise. Why would a system that seeks to minimize surprise be more motivated by surprising rewards than by predictive rewards? I don’t know, but here’s a guess: dopamine is released in response to surprising positive outcomes.
Why “surprising positive outcomes” and not just “positive outcomes”? Because there is no reason to adjust the weights of a neural network in response to an unsurprising outcome.
Motivating the Pursuit of Abstract Concepts
Suppose we have used checksums to create a high-level concept like “my personal status”. One way to motivate status-seeking behavior would be to squirt the brain with dopamine whenever “my personal status” goes up. But that’s not a stable solution because it doesn’t actually seek to maintain consistently high status. Instead, it just seeks surprising increases in status. Dopamine is useful, but if you rely purely on dopamine hits to motivate behavior then your neural network will become a compulsive gambler. We can’t rely just on dopamine hacks. We have to go back to free energy.
Let’s go back to the “my personal status” cortex of the predictive brain which I will call “region S”. Region S was produced by feeding several heuristics—perhaps “how much people like me”, “how rich I am” and “whether I have sex with attractive people”—into region S. Suppose that these heuristics generate signals of “1“ and you’re high-status and “-1” when you’re low-status. Region S learns to generate a signal of “1” when you’re high-status and “-1” when you’re low-status.
What happens when we feed a signal of “1” into region S? If the brain is already in a high-status organism then nothing will happen. But if the brain is in a low-status organism then our signal of “1″ will cause a conflict, thus increasing free energy of region S. Predictive processors produce signals that minimize free energy. If our brain is operating properly then it will attempt to resolve the conflict by outputting behavior that it anticipates will cause its status to rise. We have programmed status-seeking behavior.
That’s if things go right. If things go wrong[1] then our neural net will conclude that it has high status despite all evidence to the contrary. We have programmed schizophrenia.
Edit 2022-02-16. There’s a reason this doesn’t actually work in a real biological brain. It’s that real neurons will learn to ignore an input that’s hard-coded to a value of 1. I plan to address this problem in a future post.
Predictive Processing, Heterosexuality and Delusions of Grandeur
Predictive processing is the theory that biological neurons minimize free energy. In this context, free energy isn’t physical energy like the energy in your laptop battery. Instead, free energy is an informatic concept. It is useful to think about free energy as a single variable balancing prediction error and model complexity. Minimizing free energy balances minimizing prediction error against minimizing model complexity. You can also think about it as minimizing surprise.
Biological neurons have inputs and outputs. Each neuron receives inputs from one set of neurons and sends outputs to another set of neurons. One way to minimize prediction error is to fire right after receiving inputs, but an even better way to minimize prediction error is for a neuron to anticipate when its input neurons will fire and then fire along with them. Firing in-sync with its inputs produces zero prediction error instead of just a small prediction error.
It has been shown that predictive processing is asymptotically equivalent to backpropagation. Everything that can be computed with backpropagation can be computed via predictive processing and vice versa.
A Simple Clock
Suppose you take a 3-dimensional blob of neurons programmed to minimize local prediction error and you attach them to a sinusoidal wave generator. The neural net has no outputs—just this single input. At first the neurons close to the wave generator will lag behind the sinusoidal wave generator. But eventually they’ll and sync up with it. Our neural net will have produced an internal representation of the world.
Now suppose you plug a sinusoidal wave generator into the left end of the neural net and a square wave generator into the right end of the neural net. The left end of the neural net will wave in time with the sinusoidal wave generator and the right end of the neural net will beat in time with the square wave generator. The middle of the net will find some smooth balance between the two. Plug more complicated inputs into our neural net and it will generate a more complicated representation of the world.
Simple Behavior
Consider a neural net with one input and one output. The output of the brain is connected to the input of a static noise generator and the static noise generator’s output is connected to the brain’s input. The brain is a free energy minimizer. Static noise has high free energy. The brain will thus tend to produce outputs that minimize the noise. In other words, the brain will figure out (if it can) how to turn off the static noise generator.
Scientists used this principle to train a blob of neurons to play pong. They connected a biological neural net to a game of pong and then blasted the net with static whenever the net failed to hit the ball. Result: the paddle intercepted the ball more often than random chance.
We now have all the ingredients we need to build a simple nervous system. Start with a net of neurons. Attach sensory inputs to the net. The net will create an internal model of the world. Whenever something bad happens (such as the organism suffering damage), blast the net with a short, intense burst of random noise. The organism will learn to avoid whatever causes the short, intense bursts of random noise. We have invented pain.
Cultivating Abstract Concepts
Some features of the environment can be hard-coded. You don’t need to learn how to feel pain or feel warmth. All evolution has to do is invent the relevant sensors and then emit a burst of static (in the case of pain) or a less-noisy signal (in the case of warmth).
Other features cannot be hard-coded at all. Your brain contains no priors about whether whether Obanazawa is in the Yamagata Prefecture. Facts like that are purely learned.
But some features of the environment are partially hard-coded and partially learned.
I’m heterosexual (straight). Being straight is normative in the sense that most people are straight. Heterosexuality makes sense in evolutionary terms too. But being straight is computationally bizarre. If given the chance to kiss a healthy woman my age in a dark room then I would say yes. If given the chance to kiss a healthy man my age in a dark room then I would say no. But I can’t actually tell whether I’m making out with a man or woman (assuming the man has shaved) just from the feeling of her/his lips. Evolution programmed me to prefer the abstract concept of a woman to the abstract concept of a man.
How can evolution condition a predictive processing biological net to learn an abstract concept like “woman”? Culture plays a role, but heterosexuality is older than culture. I think the answer has something to do with checksums.
Suppose you want a certain region W of a biological neural net to activate in response to seeing a woman. The concept of a “woman” is too abstract to define exactly. All you have are a bunch of heuristics that are all correlated with observing a woman. For example, you could have one heuristic which fires in response to high-pitched voices, another which fires in response to more complicated inflection, another which fires in response to seeing breasts and so on. Feed the outputs of all these heuristics into the inputs of region W. Loosely couple region W to the rest of your world model. Region W will eventually learn to trigger in response to the abstract concept of a woman. Region W will even draw on other information in the broader world model when deciding whether to fire.
Reinforcement
So far we have discussed how to punish our neural network: just blast it with pain. But we have not yet discussed how to reward our neural network.
Dopamine has something to do with reward, motivation and reinforcement, but not pleasure. My guess is that when dopamine is released it temporarily increases the brain’s learning rate—the amount that connections are updated—by a lot. If this is true then we ought to expect dopamine releases to produce short-term local overfitting, and to cause behavior that is globally maladaptive (increases free energy) in the long run.
There is one potential problem with this theory: It is a long-established fact in behavioral psychology that intermittent reinforcement are more motivating than predictable reinforcement. But predictive processing seeks to minimize surprise. Why would a system that seeks to minimize surprise be more motivated by surprising rewards than by predictive rewards? I don’t know, but here’s a guess: dopamine is released in response to surprising positive outcomes.
Why “surprising positive outcomes” and not just “positive outcomes”? Because there is no reason to adjust the weights of a neural network in response to an unsurprising outcome.
Motivating the Pursuit of Abstract Concepts
Suppose we have used checksums to create a high-level concept like “my personal status”. One way to motivate status-seeking behavior would be to squirt the brain with dopamine whenever “my personal status” goes up. But that’s not a stable solution because it doesn’t actually seek to maintain consistently high status. Instead, it just seeks surprising increases in status. Dopamine is useful, but if you rely purely on dopamine hits to motivate behavior then your neural network will become a compulsive gambler. We can’t rely just on dopamine hacks. We have to go back to free energy.
Let’s go back to the “my personal status” cortex of the predictive brain which I will call “region S”. Region S was produced by feeding several heuristics—perhaps “how much people like me”, “how rich I am” and “whether I have sex with attractive people”—into region S. Suppose that these heuristics generate signals of “1“ and you’re high-status and “-1” when you’re low-status. Region S learns to generate a signal of “1” when you’re high-status and “-1” when you’re low-status.
What happens when we feed a signal of “1” into region S? If the brain is already in a high-status organism then nothing will happen. But if the brain is in a low-status organism then our signal of “1″ will cause a conflict, thus increasing free energy of region S. Predictive processors produce signals that minimize free energy. If our brain is operating properly then it will attempt to resolve the conflict by outputting behavior that it anticipates will cause its status to rise. We have programmed status-seeking behavior.
That’s if things go right. If things go wrong[1] then our neural net will conclude that it has high status despite all evidence to the contrary. We have programmed schizophrenia.
Edit 2022-02-16. There’s a reason this doesn’t actually work in a real biological brain. It’s that real neurons will learn to ignore an input that’s hard-coded to a value of 1. I plan to address this problem in a future post.