abramdemski comments on Multi-agent predictive minds and AI alignment

abramdemski 22 Dec 2018 4:09 UTC
9 points
Vague introspective evidence for active inference comes from an ability to do inner simulations.
I would take this as introspective evidence in favor of something model-based, but it could look more like model-based RL rather than active inference. (I am not specifically advocating for model-based RL as the right model of human thinking.)
Possibly boldest claim I can make from the principle alone is that people will have a bias to take actions which will “prove their models are right” even at the cost of the actions being actually harmful for them in some important sense.
I believe this claim based on social dynamics—among social creatures, it seems evolutionarity useful to try to prove your models right. An adaptation for doing this may influence your behavior even when you have no reason to believe anyone is looking or knows about the model you are confirming.
So, an experiment which would differentiate between socio-evolutionary causes and active inference would be to look for the effect in non-social animals. An experiment which comes to mind is that you somehow create a situation where an animal is trying to achieve some goal, but you give false feedback so that the animal momentarily thinks it is less successful than it is. Then, you suddenly replace the false feedback with real feedback. Does the animal try and correct to the previously believed (false) situation, in order to minimize predictive error? Rather than continuing to optimize in a way consistent with the task reward?
There are a lot of confounders. For example, one version of the experiment would involve trying to put your paw as high in the air as possible, and (somehow) initially getting false feedback about how well you are doing. When you suddenly start getting good feedback, do you re-position the paw to restore the previous level of feedback (minimizing predictive error) before trying to get it higher again? A problem with the experiment is that you might re-position your paw just because the real feedback changes the cost-benefit ratio, so a rational agent would try less hard at the task if it found out it was doing better than it thought.
A second example: pushing an object to a target location on the floor. If (somehow) you initially get bad feedback about where you are on the floor, and suddenly the feedback gets corrected, do you go to the location you thought you were at before continuing to make progress toward the goal? A confounder here is that you may have learned a procedure for getting the object to the desired location, and you are more confident in the results of following the procedure than you are otherwise. So, you prefer to push the object to the target location along the familiar route rather than in the efficient route from the new location, but this is a consequence of expected utility maximization under uncertainty about the task rather than any special desire to increase familiarity.
Note that I don’t think of this as a prediction made by active inference, since active inference broadly speaking may precisely replicate max-expected-utility, or do other things. However, it seems like a prediction made by your favored version of active inference.
Because of the theoretical beauty, I think we should take active inference seriously as an architectural principle.
I think we may be able to make some progress on the question of its theoretical beauty. I share a desire for unified principles of epistemic and instrumental reasoning. However, I have an intuition that active inference is just not the right way to go about it. The unification is too simplistic, and has too many degrees of freedom. It should have some initial points for its simplicity, but it should lose those points when the simplest versions don’t seem right (eg, when you conclude that the picture is missing goals/motivation).
So far, the description was broadly Bayesian/optimal/”unbounded”. Unbounded predictive processing / active inference agent is a fearsome monster in a similar way as a fully rational VNM agent. The other key ingredient is bounded rationality. Most biases are consequence of computational/signal processing boundedness, both in PP/AI models and non PP/AI models.
FWIW, I want to mention logical induction as a theory of bounded rationality. It isn’t really bounded enough to be the picture of what’s going on in humans, but it is certainly major progress on the question of what should happen to probability theory when you have bounded processing power.
I mention this not because it is directly relevant, but because I think people don’t necessarily realize logical induction is in the “bounded rationality” arena (even though “logical uncertainty” is definitionally very very close to “bounded rationality”, the type of person who tends to talk about logical uncertainty is usually pretty different from the type of person who talks about bounded rationality, I think).
---
Another thing I want to mention—although not every version of active inference predicts that organisms actively seek out the familiar and avoid the unfamiliar, it does seem like one of the central intended predictions, and a prediction I would guess most advocates of active inference would argue matches reality. One of my reasons for not liking the theory much is because I don’t think it is likely to capture curiosity well. Humans engage in both familiarity-seeking and novelty-seeking behavior, and both for a variety of reasons (both terminal-goal-ish and instrumental-goal-ish), but I think we are closer to novelty-seeking than active inference would predict.
In Delusion, Survival, and Intelligent Agents (Ring & Orseau), behavior of a knowledge-seeking agent and a predictive-accuracy seeking agent are compared. Note that the knowledge-seeking agent and predictive-accuracy seeking agent have exactly opposite utility functions: the knowledge-seeking agent likes to be surprised, whereas the accuracy-seeking agent dislikes surprises. The knowledge-seeking agent behaves in (what I see as) a much more human way than the accuracy-seeking agent. The accuracy-seeking agent will try to gain information to a limited extent, but will ultimately try to remove all sources of novel stimuli to the extent possible. The knowledge-seeking agent will try to do new things forever.
I would also expect evolution to produce something more like the knowledge-seeking agent than the accuracy-seeking agent. In RL, curiosity is a major aid to learning. The basic idea is to augment agents with an intrinsic motive to gain information, in order to ultimately achieve better task performance. There are a wide variety of formulas for curiosity, but as far as I know they are all closer to valuing surprise than avoiding surprise, and this seems like what they should be. So, to the extent that evolution did something similar to designing a highly effective RL agent, it seems more likely that organisms seek novelty as opposed to avoid it.
So, I think the idea that organisms seek familiar experiences over unfamiliar is actually the opposite of what we should expect overall. It is true that for an organism which has learned a decent amount about its environment, we expect to see it steering toward states that are familiar to it. But this is just a consequence of the fact that it has optimized its policy quite a bit; so, it steers toward rewarding states, and it will have seen rewarding states frequently in the past for the same reason. However, in order to get organisms to this place as reliably as possible, it is more likely that evolution would have installed a decision procedure which steers disproportionately toward novelty (all else being equal) than one which steers disproportionately away from novelty (all else being equal).