Gordon Seidoh Worley comments on Minimization of prediction error as a foundation for human values in AI alignment

Gordon Seidoh Worley 10 Oct 2019 17:56 UTC
5 points
The tone of the following is a bit more adversarial than I’d like; sorry for that. My attitude toward predictive processing comes from repeated attempts to see why people like it, and all the reasons seeming to fall flat to me. If you respond, I’m curious about your reaction to these points, but it may be more useful for you to give the positive reasons why you think your position is true (or even just why it would be appealing), particularly if they’re unrelated to what I’m about to say.
I’ll reply to your points soon because I think doing that is a helpful way for me and others to explore this idea, although it might take me a little time since this is not the only thing I have to do, but first I’ll respond to this request that I seemingly left out.
I have two main lines of evidence that come together to make me like this theory.
One is that it’s elegant, simple, and parsimonious. Control systems are simple, they look to me to be the simplest thing we might reasonably call “alive” or “conscious” if we try to redefine those terms in ways that are not anchored on our experience here on Earth. I think the reason it’s so hard to answer questions about what is alive and what is conscious is because the naive categories we form and give those names are ultimately rooted in simple phenomena involving information “pumping” that locally reduce entropy but there are many things that do this that are outside our historical experience of what we could observe to generate information which historically made more sense to think of as “dead” than “alive”. In a certain sense this leads me to a position you might call “cybernetic panpsychism”, but that’s just fancy words for saying there’s nothing so special going on in the universe that makes us different from rocks and stars than (increasingly complex) control systems creating information.
Another is that it fits with a lot of my understanding of human psychology. Western psychology doesn’t really get down to a level where it has a solid theory of what’s going on at the lowest levels of the mind, but Buddhist’s psychology of the abhidharma does, and it says that right after “contact” (stuff interacting with neurons) comes “feeling/sensing”, and this is claimed to always contain a signal of positive, negative, or neutral judgement. My own experience with meditation showed me something similar such that when I learned about this theory it seemed like an obviously correct way of explaining what I was experiencing. This makes me strongly believe that any theory of value we want to develop should account for this experience of valence showing up and being attached to every experience.
In light of this second reason, I’ll add to my first reason that it seems maximally parsimonious that if we were looking for an origin of valence it would have to be about something simple that could be done by a control system, and the simplest thing it could do that doesn’t simply ignore the input is test how far off an observed input is from a set point. If something more complex is going on, I think we’d need an explanation for why sending a signal indicating distance from a set point is not enough.
I briefly referenced these above, but left it all behind links.
I think there are also some other lines of evidence that are less compelling to me but seem worth mentioning:
- People have managed to build AI out of control systems minimizing prediction error, albeit doing, like I propose is necessary, by having some fixed set points that prevent dark room problems.
- Neurons do seem to function like simple control systems, though I think we have yet to determine with sufficient certainty that is all that is going on.
- Predictive coding admits explanations for many phenomena, but this risks just-so stories of the sort we see when evolutionary psychology tries to claim more than it can.
- abramdemski 11 Oct 2019 3:30 UTC
  6 points
  Parent
  One is that it’s elegant, simple, and parsimonious.
  I certainly agree here. Furthermore I think it makes sense to try and unify prediction with other aspects of cognition, so I can get that part of the motivation (although I don’t expect that humans have simple values). I just think this makes bad predictions.
  Control systems are simple, they look to me to be the simplest thing we might reasonably call “alive” or “conscious” if we try to redefine those terms in ways that are not anchored on our experience here on Earth.
  No disagreement here.
  and this is claimed to always contain a signal of positive, negative, or neutral judgement.
  Yeah, this seems like an interesting claim. I basically agree with the phenomenological claim. This seems to me like evidence in favor of a hierarchy-of-thermostats model (with one major reservation which I’ll describe later). However, not particularly like evidence of the prediction-error-minimization perspective. We can have a network of controllers which express wishes to each other separately of predictions. Yes, that’s less parsimonious, but I don’t see a way to make the first work without dubious compromises.
  Here’s the reservation which I promised—if we have a big pile of controllers, how would we know (based on phenomenal experience) that controllers attach positive/negative valence “locally” to every percept?
  Forget controllers for a moment, and just suppose that there’s any hierarchy at all. It could be made of controller-like pieces, or neural networks learning via backprop, etc. As a proxy for conscious awareness, let’s ask: what kind of thing can we verbally report? There isn’t any direct access to things inside the hierarchy; there’s only the summary of information which gets passed up the hierarchy.
  In other words: it makes sense that low-level features like edge detectors and colors get combined into increasingly high-level features until we recognize an object. However, it’s notable that our high-level cognition can also purposefully attend to low-level features such as lines. This isn’t really predicted by the basic hierarchy picture—more needs to be said about how this works.
  So, similarly, we can’t predict that you or I verbally report positive/negative/neutral attaching to percepts from the claim that the sensory hierarchy is composed of units which are controllers. A controller has valence in that it has goals and how-it’s-doing on those goals, but why should we expect that humans verbally report the direct experience of that? Humans don’t have direct conscious experience of everything going on in neural circuitry.
  This is not at all a problem with minimization of prediction error; it’s more a question about hierarchies of controllers.
  - Gordon Seidoh Worley 11 Oct 2019 4:23 UTC
    2 points
    Parent
    
    So, similarly, we can’t predict that you or I verbally report positive/negative/neutral attaching to percepts from the claim that the sensory hierarchy is composed of units which are controllers. A controller has valence in that it has goals and how-it’s-doing on those goals, but why should we expect that humans verbally report the direct experience of that? Humans don’t have direct conscious experience of everything going on in neural circuitry.
    
    Yeah this is s good point and I agree it’s one of the things that I am looking for others to verify with better brain imaging technology. I find myself in the position of working ahead of what we can completely verify now because I’m willing to take the bet that it’s right or at least right enough that however it’s wrong won’t throw out the work I do.
- abramdemski 11 Oct 2019 4:13 UTC
  2 points
  Parent
  In light of this second reason, I’ll add to my first reason that it seems maximally parsimonious that if we were looking for an origin of valence it would have to be about something simple that could be done by a control system, and the simplest thing it could do that doesn’t simply ignore the input is test how far off an observed input is from a set point. If something more complex is going on, I think we’d need an explanation for why sending a signal indicating distance from a set point is not enough.
  I more or less said this in my other comment, but to reply to this directly—it makes sense to me that you could have a hierarchy of controllers which communicate via set points and distances from set points, but this doesn’t particularly make me think set points are predictions.
  Artificial neural networks basically work this way—signals go one way, “degree of satisfaction” goes the other way (the gradient). If the ANN is being trained to make predictions, then yeah, “predictions go one way, distance from set point goes the other” (well, distance + direction). However, ANNs can be trained to do other things as well; so the signals/corrections need not be about prediction.
  People have managed to build AI out of control systems minimizing prediction error, albeit doing, like I propose is necessary, by having some fixed set points that prevent dark room problems.
  I’ve seen some results like this. I’m guessing there are a lot of different ways you could do it, but iirc what I saw seemed reasonable if what you want to do is build something like an imitation learner but also bias toward specific desired results. However, I think in that case “minimizing prediction error” meant a different thing than what you mean. So, what are you imagining?
  If I take my ANN analogy, then fixing signals doesn’t seem to help me do anything much. A ‘set-point’ is like a forward signal in the analogy, so fixing set points means fixing inputs to the ANN. But a fixed input is more or less a dead input as far as learning goes; the ANN will still just learn to produce whatever output behavior the gradient incentivises, such as prediction of the data. Fixing some of the outputs doesn’t seem very helpful either.
  Also, how is this parsimonious?
  What links here?
  - abramdemski's comment on Minimization of prediction error as a foundation for human values in AI alignment by Gordon Seidoh Worley (11 Oct 2019 7:48 UTC; 6 points)
  - Gordon Seidoh Worley 11 Oct 2019 16:34 UTC
    4 points
    Parent
    I find speaking in terms of minimization of prediction error useful to my own intuitions, but it does increasingly look like what I’m really thinking of are just generic homeostatic control systems. I like talking in terms of prediction error because I think it makes the translation to other similar theories easier (I’m thinking other Bayesian brain theories and Friston’s free energy theory), but I think it’s right to think I’m just thinking about a control system sending signals to hit a set point, even if some of those control systems do learn in a way that looks like Bayesian updating or minimization of prediction error and others don’t.
    The sense in which I think of this theory as parsimonious is that I don’t believe there is a simpler mechanism that can explain what we see. If we could talk about these phenomena in terms of control systems without using signals about distance from set points I’d prefer that, and I think the complexity we get from having to build things out of such simple components is the right move in terms of parsimony rather than having to postulate additional mechanisms. As long as I can explain things adequately without having to introduce more moving parts I’ll consider it maximally parsimonious as far as my current knowledge and needs go.
    - abramdemski 14 Oct 2019 7:14 UTC
      4 points
      Parent
      I’m still interested if you can say more about how you view it as minimizing a warped prediction. I mentioned that of you fix some parts of the network, they seem to end up getting ignored rather than producing goal-directed behaviour. Do you have an alternate picture in which this doesn’t happen? (I’m not asking you to justify yourself rigorously; I’m curious for whatever thoughts or vague images you have here, though of course all the better if it really works)
      - Gordon Seidoh Worley 14 Oct 2019 17:15 UTC
        2 points
        Parent
        Ah, I guess I don’t expect it to end up ignoring the parts of the network that can’t learn because I don’t think error minimization, learning, or anything else is a top level goal of the network. That is, there are only low-level control systems interacting, and parts of the network get not ignored by their being more powerful in various ways, probably by being positioned such that they are located in the network such that they have more influence on behavior than other parts of the network that perform Bayesian learning. This does mean I expect those parts of the network don’t learn or learn inefficiently, but they do that because it’s adaptive.
        For example, I would guess something in humans like the neocortex is capable of Bayesian learning, but it only influences the rest of the system through narrow channels that prevent it from “taking over” and making humans true prediction error minimizers, instead forcing them to do things that satisfy other set points. In buzz words you might say human minds are “complex, adaptive, emergent systems” built out of neurons with most of the function coming bottom up from the neurons or “from the middle”, if you will, in terms of network topology.