abramdemski comments on Multi-agent predictive minds and AI alignment

abramdemski 16 Dec 2018 3:06 UTC
29 points
I agree with the broad outline of your points, but I find many of the details incongruous or poorly stated. Some of this is just a general dislike of predictive processing, but assuming a predictive processing model, I don’t see why your further comments follow.
I don’t claim to understand predictive processing fully, but I read the SSC post you linked, and looked at some other sources. It doesn’t seem to me like predictive processing struggles to model goal-oriented behavior. A PP agent doesn’t try to hide in the dark all the time to make the world as easy to predict as possible, and it also doesn’t only do what it has learned to expect itself to do regardless of what leads to pleasure. My understanding is that this depends on details of the notion of free energy.
So, although I agree that there are serious problems with taking an agent and inferring its values, it isn’t clear to me that PP points to new problems of this kind. Jeffrey-Bolker rotation already illustrates that there’s a large problem within a very standard expected utility framework.
The point about viewing humans as multi-agent systems, which don’t behave like single-agent systems in general, also doesn’t seem best made within a PP framework. Friston’s claim (as I understand it) is that clumps of matter will under very general conditions eventually evolve to minimize free energy, behaving as agents. If clumps of dead matter can do it, I guess he would say that multi-agent systems can do it. Aside from that, PP clearly makes the claim that systems running on a currency of prediction error (as you put it) act like agents.
Again, this point seems fine to make outside of PP, it just seems like a non-sequitur in a PP context.
I also found the options given in the “what are we aligning with” section confusing. I was expecting to see a familiar litany of options (like aligning with system 1 vs system 2, revealed preferences vs explicitly stated preferences, etc). But I don’t know what “aligning with the output of the generative models” means—it seems to suggest aligning with a probability distribution rather than with preferences. Maybe you mean imitation learning, like what inverse reinforcement learning does? This is supported by the way you immediately contrast with CIRL in #2. But, then, #3, “aligning with the whole system”, sounds like imitation learning again—training a big black box NN to imitate humans. It’s also confusing that you mention options #1 and #2 collapsing into one—if I’m right that you’re pointing at IRL vs CIRL, it doesn’t seem like this is what happens. IRL learns to drink coffee if the human drinks coffee, whereas CIRL learns to help the human make coffee.
FWIW, I think if we can see the mind as a collection of many agents (each with their own utility function), that’s a win. Aligning with a collection of agents is not too hard, so long as you can figure out a reasonable way to settle on fair divisions of utility between them.
- Jan_Kulveit 16 Dec 2018 18:32 UTC
  12 points
  Parent
  Thanks for the feedback! Sorry, I’m really bad at describing models in text—if it seems self-contradictory or confused, it’s probably either me being bad at explanations or inferential distance (you probably need to understand predictive processing better than what you get from reading the SSC article).
  Another try… start by imagining the hierarchical generative layers (as in PP). They just model the world. Than, add active inference. Than, add the special sort of “priors” like “not being hungry” or “seek reproduction”. (You need to have those in active inference for the whole thing to describe humans IMO) Than, imagine that these “special priors” start to interact with each other …leading to a game-theoretic style mess. Now you have the sub-agents. Than, imagine some layers up in the hierarchy doing stuff like “personality/narrative generation”.
  Unless you have this picture right, the rest does not make sense. From your comments I don’t think you have the picture right. I’ll try to reply … but I’m worried it may add to confusion.
  To some extent, PP struggles to describe motivations. Predictive processing in a narrow sense is about perception, is not agenty at all—it just optimizes set of hierarchical models to minimize error. If you add active inference, the system becomes agenty, but you actually do have a problem with motivations . From some popular accounts or from some remarks by Friston it may seem otherwise, but “depends on details of the notion of free energy” is in my interpratation a statement roughly similar to a claim that physics can be stated in terms of variation principles, and the rest “depends on the notion of action”
  Jeffrey-Bolker rotation is something different leading to somewhat similar problem (J-B rotation is much more limited in what can be transformed to what, and preserves decision structure)
  My feeling is you don’t understand Friston; also I don’t want to defend pieces of Friston as I’m not sure I understand Friston.
  Options given in the “what are we aligning with” is AFAIK not something which would have been described in this way before, so an attempt to map it directly to the “familiar litany of options” is likely not the way how to understand it. Overall my feeling is here you don’t have the proposed model right and the result is mostly confusion.
  - abramdemski 17 Dec 2018 8:57 UTC
    6 points
    Parent
    I see two ways things could be. (They could also be somewhere in between, or something else entirely...)
    It could be that extending PP to model actions provides a hypothesis which sticks its neck out with some bold predictions, claiming that specific biases will be observed, and these either nicely fit observations which were previously puzzling, or have since been tested and confirmed. In that case, it would make a great deal of sense to use PP’s difficulty modeling goal-oriented behavior an a model of human less-that-goal-oriented behavior.
    It could be that PP can be extended to actions in many different ways, and it is currently unclear which way might be good. In this case, it seems like PP’s difficulty modeling goal-oriented behavior is more of a point against PP, rather than a useful model of the complexity of human values.
    The way you use “PP struggles to model goal-oriented behavior” in the discussion in the post, it seems like it would need to be in this first sense; you think PP is a good fit for human behavior, and also, that it isn’t clear how to model goals in PP.
    The way you talk about what you meant in your follow-up comment, it sounds like you mean the world is the second way. This also fits with my experience. I have seen several different proposals for extending PP to actions (that is, several ways of doing active inference). Several of these have big problems which do not seem to reflect human irrationality in any particular way. At least one of these (and I suspect more than one, based on the way Friston talks about the free energy principle being a tautology) can reproduce maximum-expected-utility planning perfectly; so, there is no advantage or disadvantage for the purpose of predicting human actions. The choice between PP and expected utility formalisms is more a question of theoretical taste.
    I think you land somewhere in the middle; you (strongly?) suspect there’s a version of PP which could stick its neck out and tightly model human irrationality, but you aren’t trying to make strong claims about what it is.
    My object-level problem with this is, I don’t know why you would suspect this to be true. I haven’t seen people offer what strikes me as support for active inference, and I’ve asked people, and looked around. But, plenty of smart people do seem to suspect this.
    My meta-level problem with this is, it doesn’t seem like a very good premise from which to argue the rest of your points in the post. Something vaguely PP-shaped may or may not be harder to extract values from than an expected-utility-based agent. (For example, the models of bounded rationality which were discussed at the human-aligned AI summer school had a similar flavor, but actually seem easier to extract values from, since the probability of an action was made to be a monotonic and continuous function of the action’s utility.)
    Again, I don’t disagree with the overall conclusions of your post, just the way you argued them.
    - Jan_Kulveit 17 Dec 2018 22:03 UTC
      12 points
      Parent
      The thing I’m trying to argue is complex and yes, it is something in the middle between the two options.
      1. Predictive processing (in the “perception” direction) makes some brave predictions, which can be tested and match data/experience. My credence in predictive processing in a narrow sense: 0.95
      2. Because of the theoretical beauty, I think we should take active inference seriously as an architectural principle. Vague introspective evidence for active inference comes from an ability to do inner simulations. Possibly boldest claim I can make from the principle alone is that people will have a bias to take actions which will “prove their models are right” even at the cost of the actions being actually harmful for them in some important sense. How it may match everyday experience: for example, here. My credence in active inference as a basic design mechanism: 0.6
      3. So far, the description was broadly Bayesian/optimal/”unbounded”. Unbounded predictive processing / active inference agent is a fearsome monster in a similar way as a fully rational VNM agent. The other key ingredient is bounded rationality. Most biases are consequence of computational/signal processing boundedness, both in PP/AI models and non PP/AI models. My credence in boundedness being a key ingredient: 0.99
      4. What is missing from the picture so far is some sort of “goals” or “motivation” (or in other view, a way how evolution can insert into the brain some signal). How Karl Friston deals with this, e.g.
      We start with the premise that adaptive agents or pheno-types must occupy a limited repertoire of physical states. For a phenotype to exist, it must possess defining characteristics or traits; both in terms of its morphology and exchange with the environment. These traits essentially limit the agent to a bounded region in the space of all states it could be in. Once outside these bounds, it ceases to possess that trait (cf., a fish out of water).
      is something which I find unsatisfactory. My credence in this being complete explanation: 0.1
      5. My hypothesis is ca. this:
      evolution inserts some “goal-directed” sub-parts into the PP/AI machinery
      these sub-parts do not somehow “directly interface the world”, but are “burried” within the hierarchy of the generative layers; so they not care about people or objects or whatever, but about some abstract variables
      they are quite “agenty”, optimizing some utility function
      from the point of view of such sub-agent, other sub-agents inside of the same mind are possibly competitors; at least some sub-agents likely have access to enough computing power to not only “care about what they are intended to care about”, but do a basic modelling of other sub-agents; internal game theoretical mess ensues
      6. This hypothesis bridges the framework of PP/AI and the world of theories viewing the mind as a multi agent system. Multi-agent theories of mind have some introspective support in various styles of psychotherapy, IFS, meditative experience, some rationality techniques. And also seem to be explain behavior where humans seem to “defect against themselves”. Credence: 0.8
      (I guess a predictive processing purist would probably describe 5. & 6. as just a case of competing predictive models, not adding anything conceptually new.)
      Now I would actually want to draw a graph how strongly 1...6. motivate different possible problems with alignment, and how these problems motivate various research questions. For example the question about understanding hierarchical modelling is interesting even if there is no multi-agency, scaling of sub-agents can be motivated even without active inference, etc.
      - abramdemski 22 Dec 2018 4:09 UTC
        9 points
        Parent
        Vague introspective evidence for active inference comes from an ability to do inner simulations.
        I would take this as introspective evidence in favor of something model-based, but it could look more like model-based RL rather than active inference. (I am not specifically advocating for model-based RL as the right model of human thinking.)
        Possibly boldest claim I can make from the principle alone is that people will have a bias to take actions which will “prove their models are right” even at the cost of the actions being actually harmful for them in some important sense.
        I believe this claim based on social dynamics—among social creatures, it seems evolutionarity useful to try to prove your models right. An adaptation for doing this may influence your behavior even when you have no reason to believe anyone is looking or knows about the model you are confirming.
        So, an experiment which would differentiate between socio-evolutionary causes and active inference would be to look for the effect in non-social animals. An experiment which comes to mind is that you somehow create a situation where an animal is trying to achieve some goal, but you give false feedback so that the animal momentarily thinks it is less successful than it is. Then, you suddenly replace the false feedback with real feedback. Does the animal try and correct to the previously believed (false) situation, in order to minimize predictive error? Rather than continuing to optimize in a way consistent with the task reward?
        There are a lot of confounders. For example, one version of the experiment would involve trying to put your paw as high in the air as possible, and (somehow) initially getting false feedback about how well you are doing. When you suddenly start getting good feedback, do you re-position the paw to restore the previous level of feedback (minimizing predictive error) before trying to get it higher again? A problem with the experiment is that you might re-position your paw just because the real feedback changes the cost-benefit ratio, so a rational agent would try less hard at the task if it found out it was doing better than it thought.
        A second example: pushing an object to a target location on the floor. If (somehow) you initially get bad feedback about where you are on the floor, and suddenly the feedback gets corrected, do you go to the location you thought you were at before continuing to make progress toward the goal? A confounder here is that you may have learned a procedure for getting the object to the desired location, and you are more confident in the results of following the procedure than you are otherwise. So, you prefer to push the object to the target location along the familiar route rather than in the efficient route from the new location, but this is a consequence of expected utility maximization under uncertainty about the task rather than any special desire to increase familiarity.
        Note that I don’t think of this as a prediction made by active inference, since active inference broadly speaking may precisely replicate max-expected-utility, or do other things. However, it seems like a prediction made by your favored version of active inference.
        Because of the theoretical beauty, I think we should take active inference seriously as an architectural principle.
        I think we may be able to make some progress on the question of its theoretical beauty. I share a desire for unified principles of epistemic and instrumental reasoning. However, I have an intuition that active inference is just not the right way to go about it. The unification is too simplistic, and has too many degrees of freedom. It should have some initial points for its simplicity, but it should lose those points when the simplest versions don’t seem right (eg, when you conclude that the picture is missing goals/motivation).
        So far, the description was broadly Bayesian/optimal/”unbounded”. Unbounded predictive processing / active inference agent is a fearsome monster in a similar way as a fully rational VNM agent. The other key ingredient is bounded rationality. Most biases are consequence of computational/signal processing boundedness, both in PP/AI models and non PP/AI models.
        FWIW, I want to mention logical induction as a theory of bounded rationality. It isn’t really bounded enough to be the picture of what’s going on in humans, but it is certainly major progress on the question of what should happen to probability theory when you have bounded processing power.
        I mention this not because it is directly relevant, but because I think people don’t necessarily realize logical induction is in the “bounded rationality” arena (even though “logical uncertainty” is definitionally very very close to “bounded rationality”, the type of person who tends to talk about logical uncertainty is usually pretty different from the type of person who talks about bounded rationality, I think).
        ---
        Another thing I want to mention—although not every version of active inference predicts that organisms actively seek out the familiar and avoid the unfamiliar, it does seem like one of the central intended predictions, and a prediction I would guess most advocates of active inference would argue matches reality. One of my reasons for not liking the theory much is because I don’t think it is likely to capture curiosity well. Humans engage in both familiarity-seeking and novelty-seeking behavior, and both for a variety of reasons (both terminal-goal-ish and instrumental-goal-ish), but I think we are closer to novelty-seeking than active inference would predict.
        In Delusion, Survival, and Intelligent Agents (Ring & Orseau), behavior of a knowledge-seeking agent and a predictive-accuracy seeking agent are compared. Note that the knowledge-seeking agent and predictive-accuracy seeking agent have exactly opposite utility functions: the knowledge-seeking agent likes to be surprised, whereas the accuracy-seeking agent dislikes surprises. The knowledge-seeking agent behaves in (what I see as) a much more human way than the accuracy-seeking agent. The accuracy-seeking agent will try to gain information to a limited extent, but will ultimately try to remove all sources of novel stimuli to the extent possible. The knowledge-seeking agent will try to do new things forever.
        I would also expect evolution to produce something more like the knowledge-seeking agent than the accuracy-seeking agent. In RL, curiosity is a major aid to learning. The basic idea is to augment agents with an intrinsic motive to gain information, in order to ultimately achieve better task performance. There are a wide variety of formulas for curiosity, but as far as I know they are all closer to valuing surprise than avoiding surprise, and this seems like what they should be. So, to the extent that evolution did something similar to designing a highly effective RL agent, it seems more likely that organisms seek novelty as opposed to avoid it.
        So, I think the idea that organisms seek familiar experiences over unfamiliar is actually the opposite of what we should expect overall. It is true that for an organism which has learned a decent amount about its environment, we expect to see it steering toward states that are familiar to it. But this is just a consequence of the fact that it has optimized its policy quite a bit; so, it steers toward rewarding states, and it will have seen rewarding states frequently in the past for the same reason. However, in order to get organisms to this place as reliably as possible, it is more likely that evolution would have installed a decision procedure which steers disproportionately toward novelty (all else being equal) than one which steers disproportionately away from novelty (all else being equal).
  - Ben Pace 16 Dec 2018 18:57 UTC
    4 points
    Parent
    you probably need to understand predictive processing better than what you get from reading the SSC article
    I’m a bit confused then that the SSC article is your citation for this concept. Did you just read the SSC article? Or if you didn’t, could you link to maybe the things you read? Also, writing a post assuming this concept but that has no sufficient explanation on the web or in the community seems suboptimal, maybe consider writing that post first. Then again, maybe you tried to make a more general point about the brain not being agents, and you could factor out the predictive processing concept and give a different example of a brain architecture that doesn’t have a utility function.
    Btw, if that is your goal, that doesn’t speak to my cruxes for why reasoning with about an AI with a utility function makes sense, which are discussed here and pointed to here (something like ‘there is a canonical way to scale me up even if it’s not obvious’).
    - Jan_Kulveit 16 Dec 2018 19:51 UTC
      7 points
      Parent
      I read the book the SSC article is reviewing (plus a bunch of articles on predictive-mind, some papers from google scholar + seen several talks). Linking the SSC review seemed more useful than linking amazon.
      I don’t think I’m the right person for writing an introduction to predictive processing for the LW community.
      Maybe I actually should have included a warning that the whole model I’m trying to describe has nontrivial inferential distance.
- NaiveTortoise 16 Dec 2018 14:59 UTC
  3 points
  Parent
  I’d be very curious to hear more about your general dislike of predictive processing if you’d be willing to share. In particular, I’m curious whether it’s a dislike of predictive processing as an algorithmic model for things like perception or predictive processing/the free energy principle as a theory of everything for “what humans are doing”.