if we want, we can always represent values in terms of minimizing prediction error (at least to a close approximation), so long as we choose the right predictions;
this might turn out to be the right thing to do, in order to represent the hierarchy thing elegantly (although I don’t currently see why, and am somewhat skeptical).
However, I don’t agree that we should think of values as being predictable from the concept of minimizing prediction error.
The tone of the following is a bit more adversarial than I’d like; sorry for that. My attitude toward predictive processing comes from repeated attempts to see why people like it, and all the reasons seeming to fall flat to me. If you respond, I’m curious about your reaction to these points, but it may be more useful for you to give the positive reasons why you think your position is true (or even just why it would be appealing), particularly if they’re unrelated to what I’m about to say.
If we look at the field of reinforcement learning, it appears to be generally useful to add intrinsic motivation for exploration to an agent. This is the exact opposite of predictability: in one case we add reward for entering unpredictable states, whereas in the other case we add reward for entering predictable states. I’ve seen people try to defend minimizing prediction error by showing that the agent is still motivated to learn (in order to figure out how to avoid unpredictability). However, the fact remains: it is still motivated to learn strictly less than an unpredictability-loving agent. RL has, in practice, found it useful to add reward for unpredictability; this suggests that evolution might have done the same, and suggests that it would not have done the exact opposite. Agents operating under a prediction-error penalty would likely under-explore.
It’s Easy to Overestimate The Degree to which Agents Minimize Prediction Error
I often enjoy variety—in food, television, etc—and observe other humans doing so. Naively, it seems like humans sometimes prefer predictability and sometimes prefer variety.
However: any learning agent, almost no matter its values, will tend to look like it is seeking predictability once it has learned its environment well. It is taking actions it has taken before, and steering toward the environmental states similar to what it always steers for. So, one could understandably reach the conclusion that it is reliability itself which the agent likes.
In other words: if I seem to eat the same foods quite often (despite claiming to like variety), you might conclude that I like familiarity when it’s actually just that I like what I like. I’ve found a set of foods which I particularly enjoy (which I can rotate between for the sake of variety). That doesn’t mean it is familiarity itself which I enjoy.
I’m not denying that mere familiarity has some positive valence for humans; I’m just saying that for arbitrary agents, it seems easy to over-estimate the importance of familiarity in their values, so we should be a bit suspicious about it for humans too. And I’m saying that it seems like humans enjoy surprises sometimes, and there’s evolutionary/machine-learning reasoning to explain why this might be the case.
We Need To Explain Why Humans Differentiate Goals and Beliefs, Not Just Why We Conflate Them
You mention that good/bad seem like natural categories. I agree that people often seem to mix up “should” and “probably is”, “good” and “normal”, “bad” and “weird”, etc. These observations in themselves speak in favor of the minimize-prediction-error theory of values.
However, we also differentiate these concepts at other times. Why is that? Is it some kind of mistake? Or is the conflation of the two the mistake?
I think the mix-up between the two is partly explained by the effect I mentioned earlier: common practice is optimized to be good, so there will be a tendency for commonality and goodness to correlate. So, it’s sensible to cluster them together mentally, which can result in them getting confused. There’s likely another aspect as well, which has something to do with social enforcement (ie, people are strategically conflating the two some of the time?) -- but I’m not sure exactly how that works.
The tone of the following is a bit more adversarial than I’d like; sorry for that. My attitude toward predictive processing comes from repeated attempts to see why people like it, and all the reasons seeming to fall flat to me. If you respond, I’m curious about your reaction to these points, but it may be more useful for you to give the positive reasons why you think your position is true (or even just why it would be appealing), particularly if they’re unrelated to what I’m about to say.
I’ll reply to your points soon because I think doing that is a helpful way for me and others to explore this idea, although it might take me a little time since this is not the only thing I have to do, but first I’ll respond to this request that I seemingly left out.
I have two main lines of evidence that come together to make me like this theory.
One is that it’s elegant, simple, and parsimonious. Control systems are simple, they look to me to be the simplest thing we might reasonably call “alive” or “conscious” if we try to redefine those terms in ways that are not anchored on our experience here on Earth. I think the reason it’s so hard to answer questions about what is alive and what is conscious is because the naive categories we form and give those names are ultimately rooted in simple phenomena involving information “pumping” that locally reduce entropy but there are many things that do this that are outside our historical experience of what we could observe to generate information which historically made more sense to think of as “dead” than “alive”. In a certain sense this leads me to a position you might call “cybernetic panpsychism”, but that’s just fancy words for saying there’s nothing so special going on in the universe that makes us different from rocks and stars than (increasingly complex) control systems creating information.
Another is that it fits with a lot of my understanding of human psychology. Western psychology doesn’t really get down to a level where it has a solid theory of what’s going on at the lowest levels of the mind, but Buddhist’s psychology of the abhidharma does, and it says that right after “contact” (stuff interacting with neurons) comes “feeling/sensing”, and this is claimed to always contain a signal of positive, negative, or neutral judgement. My own experience with meditation showed me something similar such that when I learned about this theory it seemed like an obviously correct way of explaining what I was experiencing. This makes me strongly believe that any theory of value we want to develop should account for this experience of valence showing up and being attached to every experience.
In light of this second reason, I’ll add to my first reason that it seems maximally parsimonious that if we were looking for an origin of valence it would have to be about something simple that could be done by a control system, and the simplest thing it could do that doesn’t simply ignore the input is test how far off an observed input is from a set point. If something more complex is going on, I think we’d need an explanation for why sending a signal indicating distance from a set point is not enough.
I briefly referenced these above, but left it all behind links.
I think there are also some other lines of evidence that are less compelling to me but seem worth mentioning:
People have managed to build AI out of control systems minimizing prediction error, albeit doing, like I propose is necessary, by having some fixed set points that prevent dark room problems.
Neurons do seem to function like simple control systems, though I think we have yet to determine with sufficient certainty that is all that is going on.
Predictive coding admits explanations for many phenomena, but this risks just-so stories of the sort we see when evolutionary psychology tries to claim more than it can.
One is that it’s elegant, simple, and parsimonious.
I certainly agree here. Furthermore I think it makes sense to try and unify prediction with other aspects of cognition, so I can get that part of the motivation (although I don’t expect that humans have simple values). I just think this makes bad predictions.
Control systems are simple, they look to me to be the simplest thing we might reasonably call “alive” or “conscious” if we try to redefine those terms in ways that are not anchored on our experience here on Earth.
No disagreement here.
and this is claimed to always contain a signal of positive, negative, or neutral judgement.
Yeah, this seems like an interesting claim. I basically agree with the phenomenological claim. This seems to me like evidence in favor of a hierarchy-of-thermostats model (with one major reservation which I’ll describe later). However, not particularly like evidence of the prediction-error-minimization perspective. We can have a network of controllers which express wishes to each other separately of predictions. Yes, that’s less parsimonious, but I don’t see a way to make the first work without dubious compromises.
Here’s the reservation which I promised—if we have a big pile of controllers, how would we know (based on phenomenal experience) that controllers attach positive/negative valence “locally” to every percept?
Forget controllers for a moment, and just suppose that there’s any hierarchy at all. It could be made of controller-like pieces, or neural networks learning via backprop, etc. As a proxy for conscious awareness, let’s ask: what kind of thing can we verbally report? There isn’t any direct access to things inside the hierarchy; there’s only the summary of information which gets passed up the hierarchy.
In other words: it makes sense that low-level features like edge detectors and colors get combined into increasingly high-level features until we recognize an object. However, it’s notable that our high-level cognition can also purposefully attend to low-level features such as lines. This isn’t really predicted by the basic hierarchy picture—more needs to be said about how this works.
So, similarly, we can’t predict that you or I verbally report positive/negative/neutral attaching to percepts from the claim that the sensory hierarchy is composed of units which are controllers. A controller has valence in that it has goals and how-it’s-doing on those goals, but why should we expect that humans verbally report the direct experience of that? Humans don’t have direct conscious experience of everything going on in neural circuitry.
This is not at all a problem with minimization of prediction error; it’s more a question about hierarchies of controllers.
So, similarly, we can’t predict that you or I verbally report positive/negative/neutral attaching to percepts from the claim that the sensory hierarchy is composed of units which are controllers. A controller has valence in that it has goals and how-it’s-doing on those goals, but why should we expect that humans verbally report the direct experience of that? Humans don’t have direct conscious experience of everything going on in neural circuitry.
Yeah this is s good point and I agree it’s one of the things that I am looking for others to verify with better brain imaging technology. I find myself in the position of working ahead of what we can completely verify now because I’m willing to take the bet that it’s right or at least right enough that however it’s wrong won’t throw out the work I do.
In light of this second reason, I’ll add to my first reason that it seems maximally parsimonious that if we were looking for an origin of valence it would have to be about something simple that could be done by a control system, and the simplest thing it could do that doesn’t simply ignore the input is test how far off an observed input is from a set point. If something more complex is going on, I think we’d need an explanation for why sending a signal indicating distance from a set point is not enough.
I more or less said this in my other comment, but to reply to this directly—it makes sense to me that you could have a hierarchy of controllers which communicate via set points and distances from set points, but this doesn’t particularly make me think set points are predictions.
Artificial neural networks basically work this way—signals go one way, “degree of satisfaction” goes the other way (the gradient). If the ANN is being trained to make predictions, then yeah, “predictions go one way, distance from set point goes the other” (well, distance + direction). However, ANNs can be trained to do other things as well; so the signals/corrections need not be about prediction.
People have managed to build AI out of control systems minimizing prediction error, albeit doing, like I propose is necessary, by having some fixed set points that prevent dark room problems.
I’ve seen some results like this. I’m guessing there are a lot of different ways you could do it, but iirc what I saw seemed reasonable if what you want to do is build something like an imitation learner but also bias toward specific desired results. However, I think in that case “minimizing prediction error” meant a different thing than what you mean. So, what are you imagining?
If I take my ANN analogy, then fixing signals doesn’t seem to help me do anything much. A ‘set-point’ is like a forward signal in the analogy, so fixing set points means fixing inputs to the ANN. But a fixed input is more or less a dead input as far as learning goes; the ANN will still just learn to produce whatever output behavior the gradient incentivises, such as prediction of the data. Fixing some of the outputs doesn’t seem very helpful either.
I find speaking in terms of minimization of prediction error useful to my own intuitions, but it does increasingly look like what I’m really thinking of are just generic homeostatic control systems. I like talking in terms of prediction error because I think it makes the translation to other similar theories easier (I’m thinking other Bayesian brain theories and Friston’s free energy theory), but I think it’s right to think I’m just thinking about a control system sending signals to hit a set point, even if some of those control systems do learn in a way that looks like Bayesian updating or minimization of prediction error and others don’t.
The sense in which I think of this theory as parsimonious is that I don’t believe there is a simpler mechanism that can explain what we see. If we could talk about these phenomena in terms of control systems without using signals about distance from set points I’d prefer that, and I think the complexity we get from having to build things out of such simple components is the right move in terms of parsimony rather than having to postulate additional mechanisms. As long as I can explain things adequately without having to introduce more moving parts I’ll consider it maximally parsimonious as far as my current knowledge and needs go.
I’m still interested if you can say more about how you view it as minimizing a warped prediction. I mentioned that of you fix some parts of the network, they seem to end up getting ignored rather than producing goal-directed behaviour. Do you have an alternate picture in which this doesn’t happen? (I’m not asking you to justify yourself rigorously; I’m curious for whatever thoughts or vague images you have here, though of course all the better if it really works)
Ah, I guess I don’t expect it to end up ignoring the parts of the network that can’t learn because I don’t think error minimization, learning, or anything else is a top level goal of the network. That is, there are only low-level control systems interacting, and parts of the network get not ignored by their being more powerful in various ways, probably by being positioned such that they are located in the network such that they have more influence on behavior than other parts of the network that perform Bayesian learning. This does mean I expect those parts of the network don’t learn or learn inefficiently, but they do that because it’s adaptive.
For example, I would guess something in humans like the neocortex is capable of Bayesian learning, but it only influences the rest of the system through narrow channels that prevent it from “taking over” and making humans true prediction error minimizers, instead forcing them to do things that satisfy other set points. In buzz words you might say human minds are “complex, adaptive, emergent systems” built out of neurons with most of the function coming bottom up from the neurons or “from the middle”, if you will, in terms of network topology.
We Need To Explain Why Humans Differentiate Goals and Beliefs, Not Just Why We Conflate Them
You mention that good/bad seem like natural categories. I agree that people often seem to mix up “should” and “probably is”, “good” and “normal”, “bad” and “weird”, etc. These observations in themselves speak in favor of the minimize-prediction-error theory of values.
However, we also differentiate these concepts at other times. Why is that? Is it some kind of mistake? Or is the conflation of the two the mistake?
I think the mix-up between the two is partly explained by the effect I mentioned earlier: common practice is optimized to be good, so there will be a tendency for commonality and goodness to correlate. So, it’s sensible to cluster them together mentally, which can result in them getting confused. There’s likely another aspect as well, which has something to do with social enforcement (ie, people are strategically conflating the two some of the time?) -- but I’m not sure exactly how that works.
This seems like an important question: if all these phenomena really are ultimately the same thing and powered by the same mechanisms, why do we make distinctions between them and find those distinctions useful?
I don’t have an answer I’m satisfied with, but I’ll try to say a few words about what I’m thinking and see if that moves us along.
My first approximation would be that we’re looking at things that we experience by different means and so give them different names because when we observe them they present in different ways. Goals (I assume by this you mean the cluster of things we might call desires, aversions, and generally intention towards action) probably tend to be observed by noticing the generation of signals going out that usually generate observable actions (movement, speech, etc.) whereas beliefs (the cluster of things that includes thoughts and maybe emotions) are internal and not sending out signals to action beyond mental action.
I don’t know enough to be very confident in that, though, and think like you that it could be due to numerous reasons why it might make sense to think of them as separate even if they are fundamentally not very different.
On my understanding of how things work, goals and beliefs combine to make action, so neither one is really mentally closer to action than the other. Both a goal and a belief can be quite far removed from action (eg, a nearly impossible goal which you don’t act on, or a belief about far-away things which don’t influence your day-to-day). Both can be very close (a jump scare seems most closely connected to a belief, whereas deciding to move your hand and then doing so is more goal-like—granted both those examples have complications).
If, in conversation, the distinction comes up explicitly, it is usually because of stuff like this:
Alice makes an unclear statement; it sounds like she could be claiming A or wanting A.
Bob asks for clarification, because Bob’s reaction to believing A is true would be very different from his reaction to believing A is good (or, in more relative terms, knowing Alice endorses one or the other of those). In the first case, Bob might plan under the assumption A; in the second, Bob might make plans designed to make A true.
Alice is engaging in wishful thinking, claiming that something is true when really the opposite is just too terrible to consider.
Bob wants to be able to rely on Alice’s assertions, so Bob is concerned about the possibility of wishful thinking.
Or, Bob is concerned for Alice; Bob doesn’t want Alice to ignore risks due to ignoring negative possibilities, or fail to set up back-up plans for the bad scenarios.
My point is that it doesn’t seem to me like a case of people intuitively breaking up a thing which is scientifically really one phenomena. Predicting A and wanting A seem to have quite different consequences. If you predict A, you tend to restrict attention to cases where it is true when planning; you may plan actions which rely on it. If you want A, you don’t do that; you are very aware of all the cases where not-A. You take actions designed to ensure A.
It’s Easy to Overestimate The Degree to which Agents Minimize Prediction Error
I often enjoy variety—in food, television, etc—and observe other humans doing so. Naively, it seems like humans sometimes prefer predictability and sometimes prefer variety.
However: any learning agent, almost no matter its values, will tend to look like it is seeking predictability once it has learned its environment well. It is taking actions it has taken before, and steering toward the environmental states similar to what it always steers for. So, one could understandably reach the conclusion that it is reliability itself which the agent likes.
In other words: if I seem to eat the same foods quite often (despite claiming to like variety), you might conclude that I like familiarity when it’s actually just that I like what I like. I’ve found a set of foods which I particularly enjoy (which I can rotate between for the sake of variety). That doesn’t mean it is familiarity itself which I enjoy.
I’m not denying that mere familiarity has some positive valence for humans; I’m just saying that for arbitrary agents, it seems easy to over-estimate the importance of familiarity in their values, so we should be a bit suspicious about it for humans too. And I’m saying that it seems like humans enjoy surprises sometimes, and there’s evolutionary/machine-learning reasoning to explain why this might be the case.
I’ve replied about surprise, its benefits, and its mechanism a coupletimes now. My theory is that surprise is by itself bad but can be made good by having control systems that expect surprise and send a good signal when surprise is seen. Depending on how this gets weighted, this creates a net positive mixed emotion where surprise is experienced as something good and serves many useful purposes.
I think this mostly dissolves the other points you bring up that I read as contingent on thinking the theory doesn’t predict humans would find variety and surprise good in some circumstances, but if not please let me know what the remaining concerns are in light of this explanation (or possibly object to my explanation of why we expect surprise to sometimes be net good).
I think this mostly dissolves the other points you bring up that I read as contingent on thinking the theory doesn’t predict humans would find variety and surprise good in some circumstances, but if not please let me know what the remaining concerns are in light of this explanation (or possibly object to my explanation of why we expect surprise to sometimes be net good).
Yeah, I noted that I and other humans often seem to enjoy surprise, but I also had a different point I was trying to make—the claim that it makes sense that you’d observe competent agents doing many things which can be explained by minimizing prediction error, no matter what their goals.
But, it isn’t important for you to respond further to this point if you don’t feel it accounts for your observations.
If we look at the field of reinforcement learning, it appears to be generally useful to add intrinsic motivation for exploration to an agent. This is the exact opposite of predictability: in one case we add reward for entering unpredictable states, whereas in the other case we add reward for entering predictable states. I’ve seen people try to defend minimizing prediction error by showing that the agent is still motivated to learn (in order to figure out how to avoid unpredictability). However, the fact remains: it is still motivated to learn strictly less than an unpredictability-loving agent. RL has, in practice, found it useful to add reward for unpredictability; this suggests that evolution might have done the same, and suggests that it would not have done the exact opposite. Agents operating under a prediction-error penalty would likely under-explore.
I ended up replying to this in a separate post since I felt like similar objections kept coming up. My short answer is: minimization of prediction error is minimization of error at predicting input to a control system that may not be arbitrarily free to change its prediction set point. This means that it won’t always be the case that a control system is globally trying to minimize prediction error, but instead is locally trying to minimize prediction error, although it may not be able to become less wrong over time because it can’t change the prediction to better predict the input.
From an evolutionary perspective my guess is that true Bayesian updating is a fairly recent adaptation, and most minimization of prediction error is minimization of error of mostly fixed prediction set points that are beneficial for survival.
I left a reply to this view at the other comment. However, I don’t feel that point connects very well to the point I tried to make.
Your OP talks about minimization of prediction error as a theory of human value, relevant to alignment. It might be that evolution re-purposes predictive machinery to pursue adaptive goals; this seems like the sort of thing evolution would do. However, this leaves the question of what those goals are. You say you’re not claiming that humans globally minimize prediction error. But, partly because of the remarks you made in the OP, I’m reading you as suggesting that humans do minimize prediction error, but relative to a skewed prediction.
Are human values well-predicted by modeling us as minimizing prediction error relative to a skewed prediction?
My argument here is that evolved creatures such as humans are more likely to (as one component of value) steer toward prediction error, because doing so tends to lead to learning, which is broadly valuable. This is difficult to model by taking a system which minimizes prediction error and skewing the predictions, because it is the exact opposite.
Elsewhere, you suggest that exploration can be predicted by your theory if there’s a sort of reflection within the system, so that prediction error is predicted as well. The system therefore has an overall set-point for prediction error and explores if it’s too small. But I think this would be drowned out. If I started with a system which minimizes prediction error and added a curiosity drive on top of it, I would have to entirely cancel out the error-minimization drive before I started to see the curiosity doing its job successfully. Similarly for your hypothesized part. Everything else in the system is strategically avoiding error. One part steering toward error would have to out-vote or out-smart all those other parts.
Now, that’s over-stating my point. I don’t think human curiosity drive is exactly seeking maximum prediction error. I think it’s more likely related to the derivative of prediction error. But the point remains that that’s difficult to model as minimization of a skewed prediction error, and requires a sub-part implementing curiosity to drown out all the other parts.
Instead of modeling human value as minimization of error of a skewed prediction, why not step back and model it as minimizing “some kind of error”? This seems no less parsimonious (since you have to specify the skew anyway), and leaves you with all the same controller machinery to propagate error through the system and learn to avoid it.
I have not read all the comments yet, so maybe this is redundant, but anyway...
I think it is plausible that humans and other life forms, are mostly made up of layers of control systems, stacked on each other. However it does not follow from this that humans are trying to minimise prediction error.
There are probably some part of the brain that is trying to minimise prediction error. Possibly organised as a control system that tries to keep expectations in line with reality. Because it is useful to be able to accurately predict the world.
But if we are a stack of control systems, then I would expect other parts of the brain to be control systems for other things. E.g. Having the correct level of blood sugar, having a good amount of social interaction, having a good amount of variety in our lives.
I can imagine someone figuring out more or less how the prediction control system works and what it is doing, then looking at everything else, noticing the similarity (becasue it is all types of control systems and evolution tend to reuse structures) and thinking “Hmm, maybe it is all about predictions”. But I also think that would be wrong.
In other words: if I seem to eat the same foods quite often (despite claiming to like variety), you might conclude that I like familiarity when it’s actually just that I like what I like. I’ve found a set of foods which I particularly enjoy (which I can rotate between for the sake of variety). That doesn’t mean it is familiarity itself which I enjoy.
Agents trade off exploring and exploiting, and when they’re exploiting they look like they’re minimizing prediction error?
Agents trade off exploring and exploiting, and when they’re exploiting they look like they’re minimizing prediction error?
That’s one hypothesis in the space I was pointing at, but not particularly the thing I expect to be true. Or, maybe I think it is somewhat true as an observation about policies, but doesn’t answer the question of how exactly variety and anti-variety are involved in our basic values.
A model which I more endorse:
We like to make progress understanding things. We don’t like chaotic stuff with no traction for learning (like TV fuzz). We like orderly stuff more, but only while learning about it; it then fades to zero, meaning we have to seek more variety for our hedonic treadmill. We really like patterns which keep establishing and then breaking expectations, especially if there is always a deeper pattern which makes sense of the exceptions (like music); these patterns maximize the feeling of learning progress.
But I think that’s just one aspect of our values, not a universal theory of human values.
I think this is sort of sideways. It’s true, but I think it also misses the deeper aspects of the theory I have in mind.
Yes, from easily observed behavior that’s what it looks like: exploitation is about minimizing prediction error and exploration is about, if not maximizing it, then at least not minimizing it. But the theory says that if we see exploration and the theory is correct, then exploration must somehow to built of out things that are ultimately trying to minimize prediction error.
I hope to give a more precise, mathematical explanation of this theory in the future, but for now I’ll give the best English language explanation I can of how exploration might work (keeping in mind we should be able to eventually find out exactly how it works if this theory is right with sufficient brain scanning technology).
I suspect exploration happens because a control system in the brain takes as input how much error minimization it observes as measured by how many good and bad signals get sent in other control systems. It then has a set point for some relatively stable and hard to update amount of bad signals it expects to see, and if it has not been seeing enough surprise/mistakes then it starts sending its own bad signals encouraging “restlessness” or “exploration”. This is similar to my explanation of creativity from another comment.
I agree that
there’s something to the hierarchy thing;
if we want, we can always represent values in terms of minimizing prediction error (at least to a close approximation), so long as we choose the right predictions;
this might turn out to be the right thing to do, in order to represent the hierarchy thing elegantly (although I don’t currently see why, and am somewhat skeptical).
However, I don’t agree that we should think of values as being predictable from the concept of minimizing prediction error.
The tone of the following is a bit more adversarial than I’d like; sorry for that. My attitude toward predictive processing comes from repeated attempts to see why people like it, and all the reasons seeming to fall flat to me. If you respond, I’m curious about your reaction to these points, but it may be more useful for you to give the positive reasons why you think your position is true (or even just why it would be appealing), particularly if they’re unrelated to what I’m about to say.
Evolved Agents Probably Don’t Minimize Prediction Error
If we look at the field of reinforcement learning, it appears to be generally useful to add intrinsic motivation for exploration to an agent. This is the exact opposite of predictability: in one case we add reward for entering unpredictable states, whereas in the other case we add reward for entering predictable states. I’ve seen people try to defend minimizing prediction error by showing that the agent is still motivated to learn (in order to figure out how to avoid unpredictability). However, the fact remains: it is still motivated to learn strictly less than an unpredictability-loving agent. RL has, in practice, found it useful to add reward for unpredictability; this suggests that evolution might have done the same, and suggests that it would not have done the exact opposite. Agents operating under a prediction-error penalty would likely under-explore.
It’s Easy to Overestimate The Degree to which Agents Minimize Prediction Error
I often enjoy variety—in food, television, etc—and observe other humans doing so. Naively, it seems like humans sometimes prefer predictability and sometimes prefer variety.
However: any learning agent, almost no matter its values, will tend to look like it is seeking predictability once it has learned its environment well. It is taking actions it has taken before, and steering toward the environmental states similar to what it always steers for. So, one could understandably reach the conclusion that it is reliability itself which the agent likes.
In other words: if I seem to eat the same foods quite often (despite claiming to like variety), you might conclude that I like familiarity when it’s actually just that I like what I like. I’ve found a set of foods which I particularly enjoy (which I can rotate between for the sake of variety). That doesn’t mean it is familiarity itself which I enjoy.
I’m not denying that mere familiarity has some positive valence for humans; I’m just saying that for arbitrary agents, it seems easy to over-estimate the importance of familiarity in their values, so we should be a bit suspicious about it for humans too. And I’m saying that it seems like humans enjoy surprises sometimes, and there’s evolutionary/machine-learning reasoning to explain why this might be the case.
We Need To Explain Why Humans Differentiate Goals and Beliefs, Not Just Why We Conflate Them
You mention that good/bad seem like natural categories. I agree that people often seem to mix up “should” and “probably is”, “good” and “normal”, “bad” and “weird”, etc. These observations in themselves speak in favor of the minimize-prediction-error theory of values.
However, we also differentiate these concepts at other times. Why is that? Is it some kind of mistake? Or is the conflation of the two the mistake?
I think the mix-up between the two is partly explained by the effect I mentioned earlier: common practice is optimized to be good, so there will be a tendency for commonality and goodness to correlate. So, it’s sensible to cluster them together mentally, which can result in them getting confused. There’s likely another aspect as well, which has something to do with social enforcement (ie, people are strategically conflating the two some of the time?) -- but I’m not sure exactly how that works.
I’ll reply to your points soon because I think doing that is a helpful way for me and others to explore this idea, although it might take me a little time since this is not the only thing I have to do, but first I’ll respond to this request that I seemingly left out.
I have two main lines of evidence that come together to make me like this theory.
One is that it’s elegant, simple, and parsimonious. Control systems are simple, they look to me to be the simplest thing we might reasonably call “alive” or “conscious” if we try to redefine those terms in ways that are not anchored on our experience here on Earth. I think the reason it’s so hard to answer questions about what is alive and what is conscious is because the naive categories we form and give those names are ultimately rooted in simple phenomena involving information “pumping” that locally reduce entropy but there are many things that do this that are outside our historical experience of what we could observe to generate information which historically made more sense to think of as “dead” than “alive”. In a certain sense this leads me to a position you might call “cybernetic panpsychism”, but that’s just fancy words for saying there’s nothing so special going on in the universe that makes us different from rocks and stars than (increasingly complex) control systems creating information.
Another is that it fits with a lot of my understanding of human psychology. Western psychology doesn’t really get down to a level where it has a solid theory of what’s going on at the lowest levels of the mind, but Buddhist’s psychology of the abhidharma does, and it says that right after “contact” (stuff interacting with neurons) comes “feeling/sensing”, and this is claimed to always contain a signal of positive, negative, or neutral judgement. My own experience with meditation showed me something similar such that when I learned about this theory it seemed like an obviously correct way of explaining what I was experiencing. This makes me strongly believe that any theory of value we want to develop should account for this experience of valence showing up and being attached to every experience.
In light of this second reason, I’ll add to my first reason that it seems maximally parsimonious that if we were looking for an origin of valence it would have to be about something simple that could be done by a control system, and the simplest thing it could do that doesn’t simply ignore the input is test how far off an observed input is from a set point. If something more complex is going on, I think we’d need an explanation for why sending a signal indicating distance from a set point is not enough.
I briefly referenced these above, but left it all behind links.
I think there are also some other lines of evidence that are less compelling to me but seem worth mentioning:
People have managed to build AI out of control systems minimizing prediction error, albeit doing, like I propose is necessary, by having some fixed set points that prevent dark room problems.
Neurons do seem to function like simple control systems, though I think we have yet to determine with sufficient certainty that is all that is going on.
Predictive coding admits explanations for many phenomena, but this risks just-so stories of the sort we see when evolutionary psychology tries to claim more than it can.
I certainly agree here. Furthermore I think it makes sense to try and unify prediction with other aspects of cognition, so I can get that part of the motivation (although I don’t expect that humans have simple values). I just think this makes bad predictions.
No disagreement here.
Yeah, this seems like an interesting claim. I basically agree with the phenomenological claim. This seems to me like evidence in favor of a hierarchy-of-thermostats model (with one major reservation which I’ll describe later). However, not particularly like evidence of the prediction-error-minimization perspective. We can have a network of controllers which express wishes to each other separately of predictions. Yes, that’s less parsimonious, but I don’t see a way to make the first work without dubious compromises.
Here’s the reservation which I promised—if we have a big pile of controllers, how would we know (based on phenomenal experience) that controllers attach positive/negative valence “locally” to every percept?
Forget controllers for a moment, and just suppose that there’s any hierarchy at all. It could be made of controller-like pieces, or neural networks learning via backprop, etc. As a proxy for conscious awareness, let’s ask: what kind of thing can we verbally report? There isn’t any direct access to things inside the hierarchy; there’s only the summary of information which gets passed up the hierarchy.
In other words: it makes sense that low-level features like edge detectors and colors get combined into increasingly high-level features until we recognize an object. However, it’s notable that our high-level cognition can also purposefully attend to low-level features such as lines. This isn’t really predicted by the basic hierarchy picture—more needs to be said about how this works.
So, similarly, we can’t predict that you or I verbally report positive/negative/neutral attaching to percepts from the claim that the sensory hierarchy is composed of units which are controllers. A controller has valence in that it has goals and how-it’s-doing on those goals, but why should we expect that humans verbally report the direct experience of that? Humans don’t have direct conscious experience of everything going on in neural circuitry.
This is not at all a problem with minimization of prediction error; it’s more a question about hierarchies of controllers.
Yeah this is s good point and I agree it’s one of the things that I am looking for others to verify with better brain imaging technology. I find myself in the position of working ahead of what we can completely verify now because I’m willing to take the bet that it’s right or at least right enough that however it’s wrong won’t throw out the work I do.
I more or less said this in my other comment, but to reply to this directly—it makes sense to me that you could have a hierarchy of controllers which communicate via set points and distances from set points, but this doesn’t particularly make me think set points are predictions.
Artificial neural networks basically work this way—signals go one way, “degree of satisfaction” goes the other way (the gradient). If the ANN is being trained to make predictions, then yeah, “predictions go one way, distance from set point goes the other” (well, distance + direction). However, ANNs can be trained to do other things as well; so the signals/corrections need not be about prediction.
I’ve seen some results like this. I’m guessing there are a lot of different ways you could do it, but iirc what I saw seemed reasonable if what you want to do is build something like an imitation learner but also bias toward specific desired results. However, I think in that case “minimizing prediction error” meant a different thing than what you mean. So, what are you imagining?
If I take my ANN analogy, then fixing signals doesn’t seem to help me do anything much. A ‘set-point’ is like a forward signal in the analogy, so fixing set points means fixing inputs to the ANN. But a fixed input is more or less a dead input as far as learning goes; the ANN will still just learn to produce whatever output behavior the gradient incentivises, such as prediction of the data. Fixing some of the outputs doesn’t seem very helpful either.
Also, how is this parsimonious?
I find speaking in terms of minimization of prediction error useful to my own intuitions, but it does increasingly look like what I’m really thinking of are just generic homeostatic control systems. I like talking in terms of prediction error because I think it makes the translation to other similar theories easier (I’m thinking other Bayesian brain theories and Friston’s free energy theory), but I think it’s right to think I’m just thinking about a control system sending signals to hit a set point, even if some of those control systems do learn in a way that looks like Bayesian updating or minimization of prediction error and others don’t.
The sense in which I think of this theory as parsimonious is that I don’t believe there is a simpler mechanism that can explain what we see. If we could talk about these phenomena in terms of control systems without using signals about distance from set points I’d prefer that, and I think the complexity we get from having to build things out of such simple components is the right move in terms of parsimony rather than having to postulate additional mechanisms. As long as I can explain things adequately without having to introduce more moving parts I’ll consider it maximally parsimonious as far as my current knowledge and needs go.
I’m still interested if you can say more about how you view it as minimizing a warped prediction. I mentioned that of you fix some parts of the network, they seem to end up getting ignored rather than producing goal-directed behaviour. Do you have an alternate picture in which this doesn’t happen? (I’m not asking you to justify yourself rigorously; I’m curious for whatever thoughts or vague images you have here, though of course all the better if it really works)
Ah, I guess I don’t expect it to end up ignoring the parts of the network that can’t learn because I don’t think error minimization, learning, or anything else is a top level goal of the network. That is, there are only low-level control systems interacting, and parts of the network get not ignored by their being more powerful in various ways, probably by being positioned such that they are located in the network such that they have more influence on behavior than other parts of the network that perform Bayesian learning. This does mean I expect those parts of the network don’t learn or learn inefficiently, but they do that because it’s adaptive.
For example, I would guess something in humans like the neocortex is capable of Bayesian learning, but it only influences the rest of the system through narrow channels that prevent it from “taking over” and making humans true prediction error minimizers, instead forcing them to do things that satisfy other set points. In buzz words you might say human minds are “complex, adaptive, emergent systems” built out of neurons with most of the function coming bottom up from the neurons or “from the middle”, if you will, in terms of network topology.
This seems like an important question: if all these phenomena really are ultimately the same thing and powered by the same mechanisms, why do we make distinctions between them and find those distinctions useful?
I don’t have an answer I’m satisfied with, but I’ll try to say a few words about what I’m thinking and see if that moves us along.
My first approximation would be that we’re looking at things that we experience by different means and so give them different names because when we observe them they present in different ways. Goals (I assume by this you mean the cluster of things we might call desires, aversions, and generally intention towards action) probably tend to be observed by noticing the generation of signals going out that usually generate observable actions (movement, speech, etc.) whereas beliefs (the cluster of things that includes thoughts and maybe emotions) are internal and not sending out signals to action beyond mental action.
I don’t know enough to be very confident in that, though, and think like you that it could be due to numerous reasons why it might make sense to think of them as separate even if they are fundamentally not very different.
On my understanding of how things work, goals and beliefs combine to make action, so neither one is really mentally closer to action than the other. Both a goal and a belief can be quite far removed from action (eg, a nearly impossible goal which you don’t act on, or a belief about far-away things which don’t influence your day-to-day). Both can be very close (a jump scare seems most closely connected to a belief, whereas deciding to move your hand and then doing so is more goal-like—granted both those examples have complications).
If, in conversation, the distinction comes up explicitly, it is usually because of stuff like this:
Alice makes an unclear statement; it sounds like she could be claiming A or wanting A.
Bob asks for clarification, because Bob’s reaction to believing A is true would be very different from his reaction to believing A is good (or, in more relative terms, knowing Alice endorses one or the other of those). In the first case, Bob might plan under the assumption A; in the second, Bob might make plans designed to make A true.
Alice is engaging in wishful thinking, claiming that something is true when really the opposite is just too terrible to consider.
Bob wants to be able to rely on Alice’s assertions, so Bob is concerned about the possibility of wishful thinking.
Or, Bob is concerned for Alice; Bob doesn’t want Alice to ignore risks due to ignoring negative possibilities, or fail to set up back-up plans for the bad scenarios.
My point is that it doesn’t seem to me like a case of people intuitively breaking up a thing which is scientifically really one phenomena. Predicting A and wanting A seem to have quite different consequences. If you predict A, you tend to restrict attention to cases where it is true when planning; you may plan actions which rely on it. If you want A, you don’t do that; you are very aware of all the cases where not-A. You take actions designed to ensure A.
I’ve replied about surprise, its benefits, and its mechanism a couple times now. My theory is that surprise is by itself bad but can be made good by having control systems that expect surprise and send a good signal when surprise is seen. Depending on how this gets weighted, this creates a net positive mixed emotion where surprise is experienced as something good and serves many useful purposes.
I think this mostly dissolves the other points you bring up that I read as contingent on thinking the theory doesn’t predict humans would find variety and surprise good in some circumstances, but if not please let me know what the remaining concerns are in light of this explanation (or possibly object to my explanation of why we expect surprise to sometimes be net good).
Yeah, I noted that I and other humans often seem to enjoy surprise, but I also had a different point I was trying to make—the claim that it makes sense that you’d observe competent agents doing many things which can be explained by minimizing prediction error, no matter what their goals.
But, it isn’t important for you to respond further to this point if you don’t feel it accounts for your observations.
I ended up replying to this in a separate post since I felt like similar objections kept coming up. My short answer is: minimization of prediction error is minimization of error at predicting input to a control system that may not be arbitrarily free to change its prediction set point. This means that it won’t always be the case that a control system is globally trying to minimize prediction error, but instead is locally trying to minimize prediction error, although it may not be able to become less wrong over time because it can’t change the prediction to better predict the input.
From an evolutionary perspective my guess is that true Bayesian updating is a fairly recent adaptation, and most minimization of prediction error is minimization of error of mostly fixed prediction set points that are beneficial for survival.
I left a reply to this view at the other comment. However, I don’t feel that point connects very well to the point I tried to make.
Your OP talks about minimization of prediction error as a theory of human value, relevant to alignment. It might be that evolution re-purposes predictive machinery to pursue adaptive goals; this seems like the sort of thing evolution would do. However, this leaves the question of what those goals are. You say you’re not claiming that humans globally minimize prediction error. But, partly because of the remarks you made in the OP, I’m reading you as suggesting that humans do minimize prediction error, but relative to a skewed prediction.
Are human values well-predicted by modeling us as minimizing prediction error relative to a skewed prediction?
My argument here is that evolved creatures such as humans are more likely to (as one component of value) steer toward prediction error, because doing so tends to lead to learning, which is broadly valuable. This is difficult to model by taking a system which minimizes prediction error and skewing the predictions, because it is the exact opposite.
Elsewhere, you suggest that exploration can be predicted by your theory if there’s a sort of reflection within the system, so that prediction error is predicted as well. The system therefore has an overall set-point for prediction error and explores if it’s too small. But I think this would be drowned out. If I started with a system which minimizes prediction error and added a curiosity drive on top of it, I would have to entirely cancel out the error-minimization drive before I started to see the curiosity doing its job successfully. Similarly for your hypothesized part. Everything else in the system is strategically avoiding error. One part steering toward error would have to out-vote or out-smart all those other parts.
Now, that’s over-stating my point. I don’t think human curiosity drive is exactly seeking maximum prediction error. I think it’s more likely related to the derivative of prediction error. But the point remains that that’s difficult to model as minimization of a skewed prediction error, and requires a sub-part implementing curiosity to drown out all the other parts.
Instead of modeling human value as minimization of error of a skewed prediction, why not step back and model it as minimizing “some kind of error”? This seems no less parsimonious (since you have to specify the skew anyway), and leaves you with all the same controller machinery to propagate error through the system and learn to avoid it.
I have not read all the comments yet, so maybe this is redundant, but anyway...
I think it is plausible that humans and other life forms, are mostly made up of layers of control systems, stacked on each other. However it does not follow from this that humans are trying to minimise prediction error.
There are probably some part of the brain that is trying to minimise prediction error. Possibly organised as a control system that tries to keep expectations in line with reality. Because it is useful to be able to accurately predict the world.
But if we are a stack of control systems, then I would expect other parts of the brain to be control systems for other things. E.g. Having the correct level of blood sugar, having a good amount of social interaction, having a good amount of variety in our lives.
I can imagine someone figuring out more or less how the prediction control system works and what it is doing, then looking at everything else, noticing the similarity (becasue it is all types of control systems and evolution tend to reuse structures) and thinking “Hmm, maybe it is all about predictions”. But I also think that would be wrong.
Agents trade off exploring and exploiting, and when they’re exploiting they look like they’re minimizing prediction error?
That’s one hypothesis in the space I was pointing at, but not particularly the thing I expect to be true. Or, maybe I think it is somewhat true as an observation about policies, but doesn’t answer the question of how exactly variety and anti-variety are involved in our basic values.
A model which I more endorse:
We like to make progress understanding things. We don’t like chaotic stuff with no traction for learning (like TV fuzz). We like orderly stuff more, but only while learning about it; it then fades to zero, meaning we have to seek more variety for our hedonic treadmill. We really like patterns which keep establishing and then breaking expectations, especially if there is always a deeper pattern which makes sense of the exceptions (like music); these patterns maximize the feeling of learning progress.
But I think that’s just one aspect of our values, not a universal theory of human values.
I think this is sort of sideways. It’s true, but I think it also misses the deeper aspects of the theory I have in mind.
Yes, from easily observed behavior that’s what it looks like: exploitation is about minimizing prediction error and exploration is about, if not maximizing it, then at least not minimizing it. But the theory says that if we see exploration and the theory is correct, then exploration must somehow to built of out things that are ultimately trying to minimize prediction error.
I hope to give a more precise, mathematical explanation of this theory in the future, but for now I’ll give the best English language explanation I can of how exploration might work (keeping in mind we should be able to eventually find out exactly how it works if this theory is right with sufficient brain scanning technology).
I suspect exploration happens because a control system in the brain takes as input how much error minimization it observes as measured by how many good and bad signals get sent in other control systems. It then has a set point for some relatively stable and hard to update amount of bad signals it expects to see, and if it has not been seeing enough surprise/mistakes then it starts sending its own bad signals encouraging “restlessness” or “exploration”. This is similar to my explanation of creativity from another comment.