In light of this second reason, I’ll add to my first reason that it seems maximally parsimonious that if we were looking for an origin of valence it would have to be about something simple that could be done by a control system, and the simplest thing it could do that doesn’t simply ignore the input is test how far off an observed input is from a set point. If something more complex is going on, I think we’d need an explanation for why sending a signal indicating distance from a set point is not enough.
I more or less said this in my other comment, but to reply to this directly—it makes sense to me that you could have a hierarchy of controllers which communicate via set points and distances from set points, but this doesn’t particularly make me think set points are predictions.
Artificial neural networks basically work this way—signals go one way, “degree of satisfaction” goes the other way (the gradient). If the ANN is being trained to make predictions, then yeah, “predictions go one way, distance from set point goes the other” (well, distance + direction). However, ANNs can be trained to do other things as well; so the signals/corrections need not be about prediction.
People have managed to build AI out of control systems minimizing prediction error, albeit doing, like I propose is necessary, by having some fixed set points that prevent dark room problems.
I’ve seen some results like this. I’m guessing there are a lot of different ways you could do it, but iirc what I saw seemed reasonable if what you want to do is build something like an imitation learner but also bias toward specific desired results. However, I think in that case “minimizing prediction error” meant a different thing than what you mean. So, what are you imagining?
If I take my ANN analogy, then fixing signals doesn’t seem to help me do anything much. A ‘set-point’ is like a forward signal in the analogy, so fixing set points means fixing inputs to the ANN. But a fixed input is more or less a dead input as far as learning goes; the ANN will still just learn to produce whatever output behavior the gradient incentivises, such as prediction of the data. Fixing some of the outputs doesn’t seem very helpful either.
I find speaking in terms of minimization of prediction error useful to my own intuitions, but it does increasingly look like what I’m really thinking of are just generic homeostatic control systems. I like talking in terms of prediction error because I think it makes the translation to other similar theories easier (I’m thinking other Bayesian brain theories and Friston’s free energy theory), but I think it’s right to think I’m just thinking about a control system sending signals to hit a set point, even if some of those control systems do learn in a way that looks like Bayesian updating or minimization of prediction error and others don’t.
The sense in which I think of this theory as parsimonious is that I don’t believe there is a simpler mechanism that can explain what we see. If we could talk about these phenomena in terms of control systems without using signals about distance from set points I’d prefer that, and I think the complexity we get from having to build things out of such simple components is the right move in terms of parsimony rather than having to postulate additional mechanisms. As long as I can explain things adequately without having to introduce more moving parts I’ll consider it maximally parsimonious as far as my current knowledge and needs go.
I’m still interested if you can say more about how you view it as minimizing a warped prediction. I mentioned that of you fix some parts of the network, they seem to end up getting ignored rather than producing goal-directed behaviour. Do you have an alternate picture in which this doesn’t happen? (I’m not asking you to justify yourself rigorously; I’m curious for whatever thoughts or vague images you have here, though of course all the better if it really works)
Ah, I guess I don’t expect it to end up ignoring the parts of the network that can’t learn because I don’t think error minimization, learning, or anything else is a top level goal of the network. That is, there are only low-level control systems interacting, and parts of the network get not ignored by their being more powerful in various ways, probably by being positioned such that they are located in the network such that they have more influence on behavior than other parts of the network that perform Bayesian learning. This does mean I expect those parts of the network don’t learn or learn inefficiently, but they do that because it’s adaptive.
For example, I would guess something in humans like the neocortex is capable of Bayesian learning, but it only influences the rest of the system through narrow channels that prevent it from “taking over” and making humans true prediction error minimizers, instead forcing them to do things that satisfy other set points. In buzz words you might say human minds are “complex, adaptive, emergent systems” built out of neurons with most of the function coming bottom up from the neurons or “from the middle”, if you will, in terms of network topology.
I more or less said this in my other comment, but to reply to this directly—it makes sense to me that you could have a hierarchy of controllers which communicate via set points and distances from set points, but this doesn’t particularly make me think set points are predictions.
Artificial neural networks basically work this way—signals go one way, “degree of satisfaction” goes the other way (the gradient). If the ANN is being trained to make predictions, then yeah, “predictions go one way, distance from set point goes the other” (well, distance + direction). However, ANNs can be trained to do other things as well; so the signals/corrections need not be about prediction.
I’ve seen some results like this. I’m guessing there are a lot of different ways you could do it, but iirc what I saw seemed reasonable if what you want to do is build something like an imitation learner but also bias toward specific desired results. However, I think in that case “minimizing prediction error” meant a different thing than what you mean. So, what are you imagining?
If I take my ANN analogy, then fixing signals doesn’t seem to help me do anything much. A ‘set-point’ is like a forward signal in the analogy, so fixing set points means fixing inputs to the ANN. But a fixed input is more or less a dead input as far as learning goes; the ANN will still just learn to produce whatever output behavior the gradient incentivises, such as prediction of the data. Fixing some of the outputs doesn’t seem very helpful either.
Also, how is this parsimonious?
I find speaking in terms of minimization of prediction error useful to my own intuitions, but it does increasingly look like what I’m really thinking of are just generic homeostatic control systems. I like talking in terms of prediction error because I think it makes the translation to other similar theories easier (I’m thinking other Bayesian brain theories and Friston’s free energy theory), but I think it’s right to think I’m just thinking about a control system sending signals to hit a set point, even if some of those control systems do learn in a way that looks like Bayesian updating or minimization of prediction error and others don’t.
The sense in which I think of this theory as parsimonious is that I don’t believe there is a simpler mechanism that can explain what we see. If we could talk about these phenomena in terms of control systems without using signals about distance from set points I’d prefer that, and I think the complexity we get from having to build things out of such simple components is the right move in terms of parsimony rather than having to postulate additional mechanisms. As long as I can explain things adequately without having to introduce more moving parts I’ll consider it maximally parsimonious as far as my current knowledge and needs go.
I’m still interested if you can say more about how you view it as minimizing a warped prediction. I mentioned that of you fix some parts of the network, they seem to end up getting ignored rather than producing goal-directed behaviour. Do you have an alternate picture in which this doesn’t happen? (I’m not asking you to justify yourself rigorously; I’m curious for whatever thoughts or vague images you have here, though of course all the better if it really works)
Ah, I guess I don’t expect it to end up ignoring the parts of the network that can’t learn because I don’t think error minimization, learning, or anything else is a top level goal of the network. That is, there are only low-level control systems interacting, and parts of the network get not ignored by their being more powerful in various ways, probably by being positioned such that they are located in the network such that they have more influence on behavior than other parts of the network that perform Bayesian learning. This does mean I expect those parts of the network don’t learn or learn inefficiently, but they do that because it’s adaptive.
For example, I would guess something in humans like the neocortex is capable of Bayesian learning, but it only influences the rest of the system through narrow channels that prevent it from “taking over” and making humans true prediction error minimizers, instead forcing them to do things that satisfy other set points. In buzz words you might say human minds are “complex, adaptive, emergent systems” built out of neurons with most of the function coming bottom up from the neurons or “from the middle”, if you will, in terms of network topology.