abramdemski comments on Minimization of prediction error as a foundation for human values in AI alignment

abramdemski 11 Oct 2019 7:48 UTC
6 points
0
I have two points of confusion about this:
- How does it work? I made some remarks in this other comment, and more extensive remarks below.
- How is minimizing error from a fixed/slow-moving set-point different from pursuing arbitrary goals? What’s left of the minimization-of-prediction-error hypothesis?
When I think of minimizing prediction error, I think of minimizing error of something which is well-modeled as a predictor. A set-point for sex, say, doesn’t seem like this—many organisms get far less than their satiation level of sex, but the set-point evolved based on genetic fitness, not predictive accuracy. The same is true for other scarce resources in the ancestral environment, such as sugar.
Is your model that evolution gets agents to pursue useful goals by warping predictive circuitry to make false-but-useful predictions? Or is it that evolution would fix the ancient predictive circuitry if it were better at modifying old-but-critical subsystems in big jumps, but can’t? I find the second unlikely. The first seems possible, but strains my credulity about modeling the warped stuff as “prediction”.
As for the how-does-it-work point: if we start with a predictive hierarchy but then warp some pieces to fix their set-points, how do we end up with something which strategically minimizes the prediction error of those parts? When I think of freezing some of the predictions, it seems like what you get is a world-model which is locked into some beliefs, not something which strategizes to make those predictions true.
As I mentioned in the other comment, I have seen other work which gets agents out of this sort of thing; but it seems likely they had different definitions of key ideas such as minimizing prediction error, so your response would be illuminating.
- Well-working Bayesian systems minimize prediction error in the sense that they tweak their own weights (that is, probabilities) so as to reduce future error, in response to stimuli. They don’t have a tendency to produce outputs now which are expected to reduce later prediction error. This is also true of small parts in a Bayesian network; each is individually responsible for minimizing its own prediction error of downstream info, using upstream info as helpful “freebie” information which it can benefit from in its downstream predictions. So, if you freeze a small part, its downstream neighbors will simply stop using it, because its frozen output is not useful. Upstream neighbors get the easy job of predicting the frozen values. So a mostly-bayesian system with some frozen parts doesn’t seem to start trying to minimize the prediction error of the frozen bit in other ways, because each part is responsible for minimizing their own error.
- Similarly for artificial neural networks: freezing a sub-network makes its feedforward signal useless to downstream neurons, and its backprop information little more interesting than that. Other systems of predictive hierarchies seem likely to get similar results.
The problem here is that these systems are only trying to minimize prediction error on the current step. A predictive system may have long-term models, but error is only back-propagated in a way which encourages each individual prediction to be more accurate for the time-step it was made, not in a way which encourages outputs to strategically make future inputs easier to predict.
So, the way I see it, in order for a system to strategically act so as to minimize future prediction error of a frozen sub-part, you’d need a part of the system to act as a reinforcement learner whose reward signal was the prediction error of the other part. This is not how parts of a predictive hierarchy tend to behave. Parts of a predictive hierarchy learn to reduce their own predictive error—and even there, they learn to produce outputs which are more similar to their observations, not to manipulate things so as to better match predictions.
What links here?
- abramdemski's comment on Minimization of prediction error as a foundation for human values in AI alignment by Gordon Seidoh Worley (11 Oct 2019 8:27 UTC; 4 points)
- Steven Byrnes 12 Oct 2019 1:53 UTC
  5 points
  0
  Parent
  Abram—I’ve gone back and forth a few times, but currently think that “gradient descent is myopic” arguments don’t carry through 100% when the predictions invoke memorized temporal sequences (and hierarchies or abstractions thereof) that extend arbitrarily far into the future. For example, if I think someone is about to start singing “Happy birthday”, I’m directly making a prediction about the very next moment, but I’m implicitly making a prediction about the next 30 seconds, and thus the prediction error feedback signal is not just retrospective but also partly prospective.
  
  I agree that we should NOT expect “outputs to strategically make future inputs easier to predict”, but I think there might be a non-myopic tendency for outputs that strategically make the future conform to the a priori prediction. See here, including the comments, for my discussion, trying to get my head around this.
  
  Anyway, if that’s right, that would seem to be the exact type of non-myopia needed for a hierarchical Bayesian prediction machine to also be able to act as a hierarchical control system. (And sorry again if I’m just being confused.)
  - abramdemski 14 Oct 2019 7:46 UTC
    5 points
    0
    Parent
    I appreciate your thoughts! My own thinking on this is rapidly shifting and I regret that I’m not producing more posts about it right now. I will try to comment further on your linked post. Feel encouraged to PM me if you write/wrote more in this and think I might have missed it; I’m pretty interested in this right now.
- Jan Kulveit 11 Oct 2019 11:13 UTC
  3 points
  0
  Parent
  It’s not necessarily a Gordon’s view/answer in his model, but my answers are
  - yes, evolution inserts these ‘false predictions’; (Friston calls them fixed priors, which I think is somewhat unfortunate terminology choice)
  - if you put on Dennet’s stances lense #3 (looking at systems as agents), these ‘priors’ are likely described as ‘agents’ extracting some features from the p.p. world-modelling apparatus and inserting errors accordingly; you correctly point out that in some architectures such parts would just get ignored, but in my view what happens in humans is more like a board of bayesian subagetns voting
  - note: its relatively easy to turn p.p. engine to something resembling reinforcement learning by warping it to seek ‘high reward’ states, where by states you should not imagine ‘states of the world’, but ‘states of the body’; evolution designed the chemical control circuitry of hormones before - in some sense the predictive processing machinery is built on top of some older control systems, and is seeking goal states defined by them
  - (pure guess) consciousness and language and this style of processing is another layer, where the p.p. machinery is ‘predicting’ something like a stream of conscious thoughts, which somehow has it’s own consistency rules and can implement verbal reasoning.
  Overall I’m not sure to what extent you expect clean designs from evolution. I would expect messy design, implementing predictive processing for hierarchical world-modelling/action generation, mess of subagents + emotions + hacked connection to older regulatory systems to make the p.p. engine seek evolution’s goals, and another interesting thing going on with language and memes.
  - abramdemski 14 Oct 2019 7:37 UTC
    4 points
    0
    Parent
    
    you correctly point out that in some architectures such parts would just get ignored, but in my view what happens in humans is more like a board of bayesian subagetns voting
    
    How does credit assignment work to determine these subagents’ voting power (if at all)? I’m negative about viewing it as ’prediction with warped parts (“fixed priors”), but setting that aside, one way or the other there’s the concrete question of what’s actually going on at the learning algorithm level. How do you set something up which is not incredibly myopic? (For example, if subagents are assigned credit based on who’s active when actual reward is received, that’s going to be incredibly myopic—subagents who have long-term plans for achieving better reward through delayed gratification can be undercut by greedily shortsighted agents, because the credit assignment doesn’t reward you for things that happen later; much like political terms of office making long-term policy difficult.)
    
    Overall I’m not sure to what extent you expect clean designs from evolution.
    
    I wasn’t talking about parsimony because I expect the brain to be simple, but rather because a hypothesis which has a lot of extra complexity is less likely to be right. I expect human values to be complex, but still think a desire for parsimony such as sometimes motivates PP to be good in itself—a parsimonious theory which matched observations well would be convincing in a way a complicated one would not be, even though I expect things to be complicated, because the complicated theory has many chances to be wrong.
    - Jan Kulveit 17 Oct 2019 21:31 UTC
      12 points
      0
      Parent
      Based on
      (For example, if subagents are assigned credit based on who’s active when actual reward is received, that’s going to be incredibly myopic—subagents who have long-term plans for achieving better reward through delayed gratification can be undercut by greedily shortsighted agents, because the credit assignment doesn’t reward you for things that happen later; much like political terms of office making long-term policy difficult.)
      it seems to me you have in mind a different model than me (sorry if my description was confusing). In my view, you have the world-modelling, “preference aggregation” and action generation done by the “predictive processing engine”. The “subagenty” parts basically extract evolutionary relevant features of this (like:hunger level), and insert error signals not only about the current state, but about future plans. (Like: if the p.p. would be planning a trajectory which is harmful to the subagent, it would insert the error signal.).
      Overall your first part seems to assume more something like reinforcement learning where parts are assigned credit for good planning. I would expect the opposite: one planning process which is “rewarded” by a committee.
      parsimonious theory which matched observations well
      With parsimony… predictive processing in my opinion explains a lot for a relatively simple and elegant model. On the theory side it’s for example
      how you can make a bayesian approximator using local computations
      how hierarchical models can grow in an evolutionary plausible way
      why predictions, why actions
      On the how do things feel for humans from the inside, for example
      some phenomena about attention
      what is that feeling when you are e.g. missing the right word, or something seems out of place
      what’s up with psychedelics
      & more
      On the neuroscience side
      my non-expert impression is the evidence that at least cortex is following the pattern that neurons at higher processing stages generate predictions that bias processing at lower levels is growing
      I don’t think predictive processing should try to explain all about humans. In one direction, animals are running on predictive processing as well, but are missing some crucial ingredient. In the opposite direction, simpler organisms had older control systems (eg hormones),we have them as well, and p.p. must be in some sense be stacked on top of that.
      - Gordon Seidoh Worley 18 Oct 2019 1:42 UTC
        2 points
        0
        Parent
        I don’t think predictive processing should try to explain all about humans. In one direction, animals are running on predictive processing as well, but are missing some crucial ingredient. In the opposite direction, simpler organisms had older control systems (eg hormones),we have them as well, and p.p. must be in some sense be stacked on top of that.
        For what it’s worth, I actually do expect that something like predictive processing is also going on with other systems built out of stuff that is not neurons, such as control systems that use steroids (which include hormones in animals) or RNA or other things for signaling and yet other things for determining set points and error distances. As I have mentioned, I think of living things as being in the same category as steam engine governors and thermostats, all united by the operation of control systems that locally decrease entropy and produce information. Obviously there are distinctions that are interesting and important for in various ways, but also important ways in which these distinctions are distractions from the common mechanism powering everything we care about.
        We can’t literally call this predictive coding since that theory is about neurons and brains, so a better name with appropriate historical precedence might be something like a “cybernetic” theory of life, although unfortunately cybernetics has been cheapened over the years in ways that make that ring of hokum, so maybe there is some other way to name this idea that avoids that issue.