That’s a lot to absorb, so I’ve skimmed it, so please forgive if responses to the following are already implicit in what you’ve said.
I thought the point of the modularity hypothesis is that the brain only approximates a universal learning machine and has to be gerrymandered and trained to do so?
If the brain were naturally a universal learner, then surely we wouldn’t have to learn universal learning (e.g. we wouldn’t have to learn to overcome cognitive biases, Bayesian reasoning wouldn’t be a recent discovery, etc.)? The system seems too gappy and glitchy, too full of quick judgement and prejudice, to have been designed as a universal learner from the ground up.
If the brain were naturally a universal learner, then surely we wouldn’t have to learn universal learning (e.g. we wouldn’t have to learn to overcome cognitive biases, Bayesian reasoning wouldn’t be a recent discovery, etc.)? The system seems too gappy and glitchy, too full of quick judgement and prejudice, to have been designed as a universal learner from the ground up.
You are conflating the ideas of universal learning and rational thinking. They are not the same thing.
I’m a strong believer in the idea that the human intelligence emerges from a strong general purpose reinforcement learning algorithm. If that’s true, then it’s very consistent with our problems of cognitive bias.
If the RL idea is correct, then thinking is best understood as as a learned behavior, just like what words we speak with our lips is a learned behavior, just as how we move our arms and legs are learned behaviors. Under the principle that we are are an RL learning machine, what we learn, is ANY behavior which helps us to maximize our reward signal.
We don’t learn rational behavior, we learn whatever behavior the learning system rationally has computed is what is needed to produce the most rewards. And in this care, our prime rewards are just those things which give us pleasure, and which reduce pain.
If we live in an environment that gives us rewards when we say “I believe God is real, and the Bible is to book of God, and the Earth is 10,000 years old”, -- then we will say those words. We will do ANYTHING that works to maximize rewards, in our enviornment. We will not only say them, we will believe them in our core. If we are conditioned by our enviornment to believe these things, that is what we will believe.
If we live in an environment that trains us to look at the data, and make conclusions based on what the data tells us (follow the behavior of a rational scientist), when we will act that way instead.
A universal learning can learn to act in any way it needs to in order to maximize rewards.
That’s what our cognitive bias is—our brain’s desire to act as our past experience as trained us, not to act rationally.
To learn to act rationally, we must carefully be trained to act rationally—which is why the ideas of less wrong are needed to overcome our bias.
Also keep in mind that the purpose of the human brain is to control our actions—and for controlling actions, speed is critical. Our brain is best understood not as a “thinking machine” but rather as a reaction machine—a machine that must choose a course of action in a very short time frame (like .1 seconds) -- so that when needed, we can quickly react to an external danger that is trying to kill is—from a bear attacking us, to a gust of wind, that almost pushed us over the edge of the cliff.
So what the brain needs to learn, as a universal learner, is an internal “program” of quick heuristics, how to respond instantly, to any environmental stimulus. We learn (universally) how to react, not how to “think rationally”.
A process like thinking rationally is a large set of learned micro reactions—one that a takes along time to assemble and perfect. To be a good rational thinker, we have to overcome all the learned reactions that have helped us in the past gain rewards, but which have been shown not to be the actions of a rational thinker. We have to help train each other, to spot false behaviors, and train the person to have only ration behaviors when we try to engage in rational behavior that is.
Most of our life, we don’t need rational behavior—we need accurate reward maximizing behavior. But when we choose to engage in a rational thought and analysis process, we want to do our best, to be rational, and not let our learned (cognitive baise) trick us into believing we are being rational, when in fact we are just reward seeking.
So, our universal learning, could be a reward maximising process, but if it is, then that explains why we have strong cognitive bias, it’s not an argument against having a cognitive baise. This is because our reward function is not wired to make us maximize rationality—it’s wired to make us act anyone needed, so as to maximize pleasure and minimize pain. Only if we immerse ourselves in an environment that rewards us for rational thinking behaviors do those behavior emerge in us.
We don’t learn rational behavior, we learn whatever behavior the learning system rationally has computed is what is needed to produce the most rewards.
Yes, this. But it is so easy to make mistakes when interpreting this statement, that I feel it requires dozen warnings to prevent readers from oversimplifying it.
For example, the behavior we learn is the behavior that produced most rewards in the past, when we were trained. If the environment changes, what we do may no longer give rewards in the new environment. Until we learn what produces rewards in the new environment.
Unless we already had an experience with changing environment, in which case we might adapt much more quickly, because we already have meta-behavior for “changing the behavior to adapt to new environment”.
Unless we already had an experience when the environment changed, we adapted our behavior, then the environment suddenly changed back, and we were horribly punished for the adapted behavior, in which case the learned meta-behavior would be “do not change your behavior to adapt to the new environment (because it will change back and you will be rewarded for persistence)”.
It is these learned meta-behaviors which make the human reactions so difficult to predict and influence.
Also, even in the unchanging environment, our behavior is not necessarily the best one (in terms of getting maximum rewards). It is merely the best one that our learning algorithm could find. For example, we will slowly move towards a local maximum, but if there is a completely different behavior that would give us higher rewards, we may simply never look at that direction, so we will never find out.
We learn to model our environment (because we have the innate ability to model things, and we learn that having some models increases the probability of a reward), but our models can be wrong, while still better than the maximum entropy hypothesis (this is why we keep them), but can be a local maximum that is actually not a good choice globally.
Human psychology has so many layers. Asking which psychological school better describes human mind seems like asking whether the functionality of a human body is better described by biology, or chemistry, or physics. The bottom layer of the human mind are reflexes and learning (which is what behaviorists got right), but trying to see everything only in terms of reflexes is like trying to describe human body only as an interaction of elementary particles—yes, this is the territory, but it is computationally intractable for us. People are influenced by the models they created in the past, some of which may be deeply dysfunctional (which is what psychoanalysts got right); and the intelligent people are able to go more meta and start making models of human happiness, or start building precise models of reality, and often change their behavior based on these models (insert other psychological schools here).
At the bottom, the learning algorithm is based on maximizing rewards, but it is not necessarily maximizing rewards in the current situation, for many different possible reasons.
Only if we immerse ourselves in an environment that rewards us for rational thinking behaviors do those behavior emerge in us.
What are the typical examples of such environment (not necessarily perfect, just significantly better than average)? I think it is (a) keeping a company of rational people who care about your rationality, for example your parents, teachers, or friends, if they happen to be rational; and (b) doing something where you interact with reality and get a precise feedback, often something like math or programming, if you happen to generalize this approach to other domains of life.
Hmm, but isn’t this conflating “learning” in the sense of “learning about the world/nature” with “learning” in the sense of “learning behaviours”? We know the brain can do the latter, it’s whether it can do the former that we’re interested in, surely?
IOW, it looks like you’re saying precisely that the brain is not a ULM (in the sense of a machine that learns about nature), it is rather a machine that approximates a ULM by cobbling together a bunch of evolved and learned behaviours.
It’s adept at learning (in the sense of learning reactive behaviours that satisfice conditions) but only proximally adept at learning about the world.
I thought the point of the modularity hypothesis is that the brain only approximates a universal learning machine and has to be gerrymandered and trained to do so?
I’m not sure what you mean by gerrymandered. I summarized the modularity hypothesis in the beginning to differentiate it from the ULM hypothesis. There are a huge range of views in this space, so I reduced them to examplars of two important viewpoint clusters.
The specific key difference is the extent to which complex mental algorithms are learned vs innate.
If the brain were naturally a universal learner, then surely we wouldn’t have to learn universal learning (e.g. we wouldn’t have to learn to overcome cognitive biases, Bayesian reasoning wouldn’t be a recent discovery, etc.)?
You certainly don’t need to learn how to overcome cognitive biases to learn (this should be obvious). Knowledge of the brain’s limitations could be useful, but is probably more useful only in the context of having a high level understanding of how the brain works.
In regards to bayesian reasoning, the brain has a huge number of parallel systems and computations going on at once, many of which are implementing efficient approximate bayesian inference.
Verbal bayesian reasoning is just a subset of verbal mathematical reasoning—mapping sentences to equations, solving, and mapping back to sentences. It’s a specific complex ability that uses a number of brain regions. It’s something you need to learn for the same reasons you need to learn multiplication. The brain does tons of analog multiplications every second, but that doesn’t mean you have an automatic innate ability to do verbal math—as you don’t have an automatic innate ability to do much of anything.
The system seems too gappy and glitchy, too full of quick judgement and prejudice, to have been designed as a universal learner from the ground up.
One of the main points I make in the article is that universal learning machines are a very general thing that - in simplest form—can be specified in a small number of bits, just like a turing machine. So it’s a sort of obvious design that evolution would find.
What I meant is that you have sub-systems dedicated to (and originally evolved to perform) specific concrete tasks, and shifting coalitions of them (or rather shifting coalitions of their abstract core algorithms) are leveraged to work together to approximate a universal learning machine.
IOW any given specific subsystem (e.g. “recognizing a red spot in a patch of green”) has some abstract algorithm at its core which is then drawn upon at need by an organizing principle which utilizes it (plus other algorithms drawn from other task-specific brain gadgets) for more universal learning tasks.
That was my sketchy understanding of how it works from evol psych and things like Dennett’s books, Pinker, etc.
Furthermore, I thought the rationale of this explanation was that it’s hard to see how a universal learning machine can get off the ground evolutionarily (it’s going to be energetically expensive, not fast enough, etc.) whereas task-specific gadgets are easier to evolve (“need to know” principle), and it’s easier to later get an approximation of a universal machine off the ground on the back of shifting coalitions of them.
Ah ok your gerrymandering analogy now makes sense.
That was my sketchy understanding of how it works from evol psych and things like Dennett’s books, Pinker, etc.
I think that’s a good summary of the evolved modularity hypothesis. It turns out that we can actually look into the brain and test that hypothesis. Those tests were done, and lo and behold, the brain doesn’t work that way. The universal learning hypothesis emerged as the new theory to explain the new neuroscience data from the last decade or so.
So basically this is what the article is all about. You said earlier you skimmed it, so perhaps I need a better abstract or summary at the top, as oge suggested.
Furthermore, I thought the rationale of this explanation was that it’s hard to see how a universal learning machine can get off the ground evolutionarily (it’s going to be energetically expensive, not fast enough, etc.) whereas task-specific gadgets are easier to evolve (“need to know” principle),
This is a pretty good sounding rationale. It’s also probably wrong. It turns out a small ULM is relatively easy to specify, and also is completely compatible with innate task-specific gadgetry. In other words the universal learning machinery has very little drawbacks. All vertebrates have a similar core architecture based on the basal ganglia. In large brained mammals, the general purpose coprocessors (neocortex, cerebellum) are just expanded more than other structures.
In particular it looks like the brainstem has a bunch of old innate circuitry that the cortex and BG learns how to control (the BG does not just control the cortex), but I didn’t have time to get into the brainstem in the scope of this article.
That’s a lot to absorb, so I’ve skimmed it, so please forgive if responses to the following are already implicit in what you’ve said.
I thought the point of the modularity hypothesis is that the brain only approximates a universal learning machine and has to be gerrymandered and trained to do so?
If the brain were naturally a universal learner, then surely we wouldn’t have to learn universal learning (e.g. we wouldn’t have to learn to overcome cognitive biases, Bayesian reasoning wouldn’t be a recent discovery, etc.)? The system seems too gappy and glitchy, too full of quick judgement and prejudice, to have been designed as a universal learner from the ground up.
You are conflating the ideas of universal learning and rational thinking. They are not the same thing.
I’m a strong believer in the idea that the human intelligence emerges from a strong general purpose reinforcement learning algorithm. If that’s true, then it’s very consistent with our problems of cognitive bias.
If the RL idea is correct, then thinking is best understood as as a learned behavior, just like what words we speak with our lips is a learned behavior, just as how we move our arms and legs are learned behaviors. Under the principle that we are are an RL learning machine, what we learn, is ANY behavior which helps us to maximize our reward signal.
We don’t learn rational behavior, we learn whatever behavior the learning system rationally has computed is what is needed to produce the most rewards. And in this care, our prime rewards are just those things which give us pleasure, and which reduce pain.
If we live in an environment that gives us rewards when we say “I believe God is real, and the Bible is to book of God, and the Earth is 10,000 years old”, -- then we will say those words. We will do ANYTHING that works to maximize rewards, in our enviornment. We will not only say them, we will believe them in our core. If we are conditioned by our enviornment to believe these things, that is what we will believe.
If we live in an environment that trains us to look at the data, and make conclusions based on what the data tells us (follow the behavior of a rational scientist), when we will act that way instead.
A universal learning can learn to act in any way it needs to in order to maximize rewards.
That’s what our cognitive bias is—our brain’s desire to act as our past experience as trained us, not to act rationally.
To learn to act rationally, we must carefully be trained to act rationally—which is why the ideas of less wrong are needed to overcome our bias.
Also keep in mind that the purpose of the human brain is to control our actions—and for controlling actions, speed is critical. Our brain is best understood not as a “thinking machine” but rather as a reaction machine—a machine that must choose a course of action in a very short time frame (like .1 seconds) -- so that when needed, we can quickly react to an external danger that is trying to kill is—from a bear attacking us, to a gust of wind, that almost pushed us over the edge of the cliff.
So what the brain needs to learn, as a universal learner, is an internal “program” of quick heuristics, how to respond instantly, to any environmental stimulus. We learn (universally) how to react, not how to “think rationally”.
A process like thinking rationally is a large set of learned micro reactions—one that a takes along time to assemble and perfect. To be a good rational thinker, we have to overcome all the learned reactions that have helped us in the past gain rewards, but which have been shown not to be the actions of a rational thinker. We have to help train each other, to spot false behaviors, and train the person to have only ration behaviors when we try to engage in rational behavior that is.
Most of our life, we don’t need rational behavior—we need accurate reward maximizing behavior. But when we choose to engage in a rational thought and analysis process, we want to do our best, to be rational, and not let our learned (cognitive baise) trick us into believing we are being rational, when in fact we are just reward seeking.
So, our universal learning, could be a reward maximising process, but if it is, then that explains why we have strong cognitive bias, it’s not an argument against having a cognitive baise. This is because our reward function is not wired to make us maximize rationality—it’s wired to make us act anyone needed, so as to maximize pleasure and minimize pain. Only if we immerse ourselves in an environment that rewards us for rational thinking behaviors do those behavior emerge in us.
Yes, this. But it is so easy to make mistakes when interpreting this statement, that I feel it requires dozen warnings to prevent readers from oversimplifying it.
For example, the behavior we learn is the behavior that produced most rewards in the past, when we were trained. If the environment changes, what we do may no longer give rewards in the new environment. Until we learn what produces rewards in the new environment.
Unless we already had an experience with changing environment, in which case we might adapt much more quickly, because we already have meta-behavior for “changing the behavior to adapt to new environment”.
Unless we already had an experience when the environment changed, we adapted our behavior, then the environment suddenly changed back, and we were horribly punished for the adapted behavior, in which case the learned meta-behavior would be “do not change your behavior to adapt to the new environment (because it will change back and you will be rewarded for persistence)”.
It is these learned meta-behaviors which make the human reactions so difficult to predict and influence.
Also, even in the unchanging environment, our behavior is not necessarily the best one (in terms of getting maximum rewards). It is merely the best one that our learning algorithm could find. For example, we will slowly move towards a local maximum, but if there is a completely different behavior that would give us higher rewards, we may simply never look at that direction, so we will never find out.
We learn to model our environment (because we have the innate ability to model things, and we learn that having some models increases the probability of a reward), but our models can be wrong, while still better than the maximum entropy hypothesis (this is why we keep them), but can be a local maximum that is actually not a good choice globally.
Human psychology has so many layers. Asking which psychological school better describes human mind seems like asking whether the functionality of a human body is better described by biology, or chemistry, or physics. The bottom layer of the human mind are reflexes and learning (which is what behaviorists got right), but trying to see everything only in terms of reflexes is like trying to describe human body only as an interaction of elementary particles—yes, this is the territory, but it is computationally intractable for us. People are influenced by the models they created in the past, some of which may be deeply dysfunctional (which is what psychoanalysts got right); and the intelligent people are able to go more meta and start making models of human happiness, or start building precise models of reality, and often change their behavior based on these models (insert other psychological schools here).
At the bottom, the learning algorithm is based on maximizing rewards, but it is not necessarily maximizing rewards in the current situation, for many different possible reasons.
What are the typical examples of such environment (not necessarily perfect, just significantly better than average)? I think it is (a) keeping a company of rational people who care about your rationality, for example your parents, teachers, or friends, if they happen to be rational; and (b) doing something where you interact with reality and get a precise feedback, often something like math or programming, if you happen to generalize this approach to other domains of life.
Hmm, but isn’t this conflating “learning” in the sense of “learning about the world/nature” with “learning” in the sense of “learning behaviours”? We know the brain can do the latter, it’s whether it can do the former that we’re interested in, surely?
IOW, it looks like you’re saying precisely that the brain is not a ULM (in the sense of a machine that learns about nature), it is rather a machine that approximates a ULM by cobbling together a bunch of evolved and learned behaviours.
It’s adept at learning (in the sense of learning reactive behaviours that satisfice conditions) but only proximally adept at learning about the world.
I’m not sure what you mean by gerrymandered. I summarized the modularity hypothesis in the beginning to differentiate it from the ULM hypothesis. There are a huge range of views in this space, so I reduced them to examplars of two important viewpoint clusters.
The specific key difference is the extent to which complex mental algorithms are learned vs innate.
You certainly don’t need to learn how to overcome cognitive biases to learn (this should be obvious). Knowledge of the brain’s limitations could be useful, but is probably more useful only in the context of having a high level understanding of how the brain works.
In regards to bayesian reasoning, the brain has a huge number of parallel systems and computations going on at once, many of which are implementing efficient approximate bayesian inference.
Verbal bayesian reasoning is just a subset of verbal mathematical reasoning—mapping sentences to equations, solving, and mapping back to sentences. It’s a specific complex ability that uses a number of brain regions. It’s something you need to learn for the same reasons you need to learn multiplication. The brain does tons of analog multiplications every second, but that doesn’t mean you have an automatic innate ability to do verbal math—as you don’t have an automatic innate ability to do much of anything.
One of the main points I make in the article is that universal learning machines are a very general thing that - in simplest form—can be specified in a small number of bits, just like a turing machine. So it’s a sort of obvious design that evolution would find.
What I meant is that you have sub-systems dedicated to (and originally evolved to perform) specific concrete tasks, and shifting coalitions of them (or rather shifting coalitions of their abstract core algorithms) are leveraged to work together to approximate a universal learning machine.
IOW any given specific subsystem (e.g. “recognizing a red spot in a patch of green”) has some abstract algorithm at its core which is then drawn upon at need by an organizing principle which utilizes it (plus other algorithms drawn from other task-specific brain gadgets) for more universal learning tasks.
That was my sketchy understanding of how it works from evol psych and things like Dennett’s books, Pinker, etc.
Furthermore, I thought the rationale of this explanation was that it’s hard to see how a universal learning machine can get off the ground evolutionarily (it’s going to be energetically expensive, not fast enough, etc.) whereas task-specific gadgets are easier to evolve (“need to know” principle), and it’s easier to later get an approximation of a universal machine off the ground on the back of shifting coalitions of them.
Ah ok your gerrymandering analogy now makes sense.
I think that’s a good summary of the evolved modularity hypothesis. It turns out that we can actually look into the brain and test that hypothesis. Those tests were done, and lo and behold, the brain doesn’t work that way. The universal learning hypothesis emerged as the new theory to explain the new neuroscience data from the last decade or so.
So basically this is what the article is all about. You said earlier you skimmed it, so perhaps I need a better abstract or summary at the top, as oge suggested.
This is a pretty good sounding rationale. It’s also probably wrong. It turns out a small ULM is relatively easy to specify, and also is completely compatible with innate task-specific gadgetry. In other words the universal learning machinery has very little drawbacks. All vertebrates have a similar core architecture based on the basal ganglia. In large brained mammals, the general purpose coprocessors (neocortex, cerebellum) are just expanded more than other structures.
In particular it looks like the brainstem has a bunch of old innate circuitry that the cortex and BG learns how to control (the BG does not just control the cortex), but I didn’t have time to get into the brainstem in the scope of this article.
Great stuff, thanks! I’ll dig into the article more.