Jan Kulveit comments on Minimization of prediction error as a foundation for human values in AI alignment

Jan Kulveit Oct 11, 2019, 11:13 AM
3 points
It’s not necessarily a Gordon’s view/answer in his model, but my answers are
- yes, evolution inserts these ‘false predictions’; (Friston calls them fixed priors, which I think is somewhat unfortunate terminology choice)
- if you put on Dennet’s stances lense #3 (looking at systems as agents), these ‘priors’ are likely described as ‘agents’ extracting some features from the p.p. world-modelling apparatus and inserting errors accordingly; you correctly point out that in some architectures such parts would just get ignored, but in my view what happens in humans is more like a board of bayesian subagetns voting
- note: its relatively easy to turn p.p. engine to something resembling reinforcement learning by warping it to seek ‘high reward’ states, where by states you should not imagine ‘states of the world’, but ‘states of the body’; evolution designed the chemical control circuitry of hormones before - in some sense the predictive processing machinery is built on top of some older control systems, and is seeking goal states defined by them
- (pure guess) consciousness and language and this style of processing is another layer, where the p.p. machinery is ‘predicting’ something like a stream of conscious thoughts, which somehow has it’s own consistency rules and can implement verbal reasoning.
Overall I’m not sure to what extent you expect clean designs from evolution. I would expect messy design, implementing predictive processing for hierarchical world-modelling/action generation, mess of subagents + emotions + hacked connection to older regulatory systems to make the p.p. engine seek evolution’s goals, and another interesting thing going on with language and memes.
- abramdemski Oct 14, 2019, 7:37 AM
  4 points
  Parent
  
  you correctly point out that in some architectures such parts would just get ignored, but in my view what happens in humans is more like a board of bayesian subagetns voting
  
  How does credit assignment work to determine these subagents’ voting power (if at all)? I’m negative about viewing it as ’prediction with warped parts (“fixed priors”), but setting that aside, one way or the other there’s the concrete question of what’s actually going on at the learning algorithm level. How do you set something up which is not incredibly myopic? (For example, if subagents are assigned credit based on who’s active when actual reward is received, that’s going to be incredibly myopic—subagents who have long-term plans for achieving better reward through delayed gratification can be undercut by greedily shortsighted agents, because the credit assignment doesn’t reward you for things that happen later; much like political terms of office making long-term policy difficult.)
  
  Overall I’m not sure to what extent you expect clean designs from evolution.
  
  I wasn’t talking about parsimony because I expect the brain to be simple, but rather because a hypothesis which has a lot of extra complexity is less likely to be right. I expect human values to be complex, but still think a desire for parsimony such as sometimes motivates PP to be good in itself—a parsimonious theory which matched observations well would be convincing in a way a complicated one would not be, even though I expect things to be complicated, because the complicated theory has many chances to be wrong.
  - Jan Kulveit Oct 17, 2019, 9:31 PM
    12 points
    Parent
    Based on
    (For example, if subagents are assigned credit based on who’s active when actual reward is received, that’s going to be incredibly myopic—subagents who have long-term plans for achieving better reward through delayed gratification can be undercut by greedily shortsighted agents, because the credit assignment doesn’t reward you for things that happen later; much like political terms of office making long-term policy difficult.)
    it seems to me you have in mind a different model than me (sorry if my description was confusing). In my view, you have the world-modelling, “preference aggregation” and action generation done by the “predictive processing engine”. The “subagenty” parts basically extract evolutionary relevant features of this (like:hunger level), and insert error signals not only about the current state, but about future plans. (Like: if the p.p. would be planning a trajectory which is harmful to the subagent, it would insert the error signal.).
    Overall your first part seems to assume more something like reinforcement learning where parts are assigned credit for good planning. I would expect the opposite: one planning process which is “rewarded” by a committee.
    parsimonious theory which matched observations well
    With parsimony… predictive processing in my opinion explains a lot for a relatively simple and elegant model. On the theory side it’s for example
    how you can make a bayesian approximator using local computations
    how hierarchical models can grow in an evolutionary plausible way
    why predictions, why actions
    On the how do things feel for humans from the inside, for example
    some phenomena about attention
    what is that feeling when you are e.g. missing the right word, or something seems out of place
    what’s up with psychedelics
    & more
    On the neuroscience side
    my non-expert impression is the evidence that at least cortex is following the pattern that neurons at higher processing stages generate predictions that bias processing at lower levels is growing
    I don’t think predictive processing should try to explain all about humans. In one direction, animals are running on predictive processing as well, but are missing some crucial ingredient. In the opposite direction, simpler organisms had older control systems (eg hormones),we have them as well, and p.p. must be in some sense be stacked on top of that.
    - Gordon Seidoh Worley Oct 18, 2019, 1:42 AM
      2 points
      Parent
      I don’t think predictive processing should try to explain all about humans. In one direction, animals are running on predictive processing as well, but are missing some crucial ingredient. In the opposite direction, simpler organisms had older control systems (eg hormones),we have them as well, and p.p. must be in some sense be stacked on top of that.
      For what it’s worth, I actually do expect that something like predictive processing is also going on with other systems built out of stuff that is not neurons, such as control systems that use steroids (which include hormones in animals) or RNA or other things for signaling and yet other things for determining set points and error distances. As I have mentioned, I think of living things as being in the same category as steam engine governors and thermostats, all united by the operation of control systems that locally decrease entropy and produce information. Obviously there are distinctions that are interesting and important for in various ways, but also important ways in which these distinctions are distractions from the common mechanism powering everything we care about.
      We can’t literally call this predictive coding since that theory is about neurons and brains, so a better name with appropriate historical precedence might be something like a “cybernetic” theory of life, although unfortunately cybernetics has been cheapened over the years in ways that make that ring of hokum, so maybe there is some other way to name this idea that avoids that issue.