How We Picture Bayesian Agents

I think that when most people picture a Bayesian agent, they imagine a system which:

  • Enumerates every possible state/​trajectory of “the world”, and assigns a probability to each.

  • When new observations come in, loops over every state/​trajectory, checks the probability of the observations conditional on each, and then updates via Bayes rule.

  • To select actions, computes the utility which each action will yield under each state/​trajectory, then averages over state/​trajectory weighted by probability, and picks the action with the largest weighted-average utility.

Typically, we define Bayesian agents as agents which behaviorally match that picture.

But that’s not really the picture David and I typically have in mind, when we picture Bayesian agents. Yes, behaviorally they act that way. But I think people get overly-anchored imagining the internals of the agent that way, and then mistakenly imagine that a Bayesian model of agency is incompatible with various features of real-world agents (e.g. humans) which a Bayesian framework can in fact handle quite well.

So this post is about our prototypical mental picture of a “Bayesian agent”, and how it diverges from the basic behavioral picture.

Causal Models and Submodels

Probably you’ve heard of causal diagrams or Bayes nets by now.

If our Bayesian agent’s world model is represented via a big causal diagram, then that already looks quite different from the original “enumerate all states/​trajectories” picture. Assuming reasonable sparsity, the data structures representing the causal model (i.e. graph + conditional probabilities on each node) take up an amount of space which grows linearly with the size of the world, rather than exponentially. It’s still too big for an agent embedded in the world to store in its head directly, but much smaller than the brute-force version.

(Also, a realistic agent would want to explicitly represent more than just one causal diagram, in order to have uncertainty over causal structure. But that will largely be subsumed by our next point anyway.)

Much more efficiency can be achieved by representing causal models like we represent programs. For instance, this little “program”:

factorial = Model {
   n = 4
   base_result = 1
   recurse_result = do(factorial, n=n-1).result
   result = (n == 0) ? base_result : n * recurse_result
}

… is in fact a recursively-defined causal model. It compactly represents an infinite causal diagram, corresponding to the unrolled computation. (See the linked post for more details on how this works.)

Conceptually, this sort of representation involves lots of causal “submodels” which “call” each other—or, to put it differently, lots of little diagram-pieces which can be wired together and reused in the full world-model. Reuse means that such models can represent worlds which are “bigger than” the memory available to the agent itself, so long as those worlds have lots of compressible structure—e.g. the factorial example above, which represents an infinite causal diagram using a finite representation.

(Aside: those familiar with probabilistic programming could view this world-model representation as simply a probabilistic program.)

Updates

So we have a style of model which can compactly represent quite large worlds, so long as those worlds have lots of compressible structure. But there’s still the problem of updates on that structure.

Here, we typically imagine some kind of message-passing, though it’s an open problem exactly what such an algorithm looks like for big/​complex models.

The key idea here is that most observations are not directly relevant to our submodels of most of the world. I see a bird flying by my office, and that tells me nothing at all about the price of gasoline[1]. So we expect that, the vast majority of the time, message-passing updates of a similar flavor to those used on Bayes nets (though not exactly the same) will quickly converge, without having to explicitly propagate to most of the submodel-nodes.

Latents

Message-passing on large models does still have some efficiency issues, however. To make things more efficient, we expect that realistic agents typically structure their model around “latent variables” which mediate most interactions. For instance, early 20th century biologists would observe that some species of animals had very similar anatomy, physiology, or behavior—i.e. if one wrote out a giant list of traits, some species would end up with very highly correlated lists. From this, they inferred some latent (i.e. not directly observed) relationship between those species—in this case, shared evolutionary ancestry. The extent to which this inference was correct varied—inferences are sometimes wrong, even when the reasoning is basically right—but either way, that “mediation by latent shared ancestry” pattern sure was how biologists structured their models.

Humans in general seem to do a very similar thing when modeling the world as containing “kinds of things”—i.e. we notice that there’s a cluster of things which have bark, leaves, wood, roots, etc, all connected in a shape with a central trunk recursively branching out both above and below ground… Then we intuitively model all these things as stemming from some latent variable (e.g. “tree-ness”). That latent variable, in our internal models, explains the correlations: a child might ask “why do things which have bark also have roots?”, and we might reply “because they’re trees”. Again, there’s room to argue about how well that answers the child’s question, but the answer does seem to reflect the internal structure of our models either way.

One key issue: different agents could, in principle, model the same environment using different latents; the latents are not necessarily fully determined by the prior + environment. For instance, I could model a bunch of rolls of a biased die as mediated by an unknown “bias”, or I could model them as just a bunch of rolls with some complicated correlations between them. The predictions will be the same. In practice minds mostly seem to converge on quite similar latents, and the general project of natural abstraction is largely aimed at understanding when and why that happens.

Aside: Map-Territory Correspondence

There is no rule saying that the variables in a Bayesian agent’s world-model have anything to do with “things” in their environment. I could totally write a Bayesian agent which models itself as living in Conway’s Game of Life and tries to maximize a utility function defined over things in Conway’s game of life (like e.g. number of gliders), but then I could wire up the inputs and outputs of that agent to a photosensor and motor in my office. The agent will mostly be very confused (i.e. its predictions will be wrong a lot), and won’t do anything interesting, but it would be a valid Bayesian agent.

In particular, it’s the latents in the model which don’t need to correspond to anything in the environment. The variables which the agent maps to its observations and actions (as opposed to latents, which are everything else), do have some rigid “correspondence”, because when the agent receives inputs it will map them to its observations, and when the agent yields outputs it will map them to its actions.

A more realistic example: some humans believe in e.g. spirits or the like. Much like the Conway’s Game of Life bot, they are just very confused, and those parts of their world model involving spirits don’t necessarily “correspond to” any actual structure in the world.

… Nonetheless, in practice it seems like most latents in most humans’ models do “correspond to” stuff in the world in some important sense, and understanding that correspondence is another big part of the general project of natural abstraction.

Utility Over Latents

One big reason that latent variables are important is that, insofar as it makes sense to view real-world agents as Bayesians at all, the inputs to those agents’ utility functions are typically latent variables—not observations or actions directly. This follows from common sentiments like “I want my spouse to actually be happy, not just to look-to-me like they’re happy”. “Look-to-me like they’re happy” would be a utility function whose inputs are my own observations directly; “actually be happy” is a utility function whose inputs are latent variables representing my spouse.

For more on this topic, see The Pointers Problem: Human Values Are A Function Of Humans’ Latent Variables.

Lazy Utility Maximization

Even if causal models structured like programs and message-passing and latents allow for efficient updates of models of large worlds (and, to be clear, we don’t think we currently have the whole story here), there’s still the question of how to efficiently maximize expected utility over the model.

A key idea here is that we never actually need to calculate expected utility, in order to maximize it.

For example, suppose I’m deciding what to order for lunch. I expect this decision to be basically-irrelevant to the vast majority of things I care about in the world and in life. But if I want to calculate my full expected utility, I need to account for all those things, from Dad’s collection of old milk bottles to future tiny genetically engineered dragons. But I don’t need to calculate all that, in order to make an expected-utility-maximizing lunch order. I just need to calculate the difference between the utility which I expect if I order lamb Karahi vs a sisig burrito.

… and since my expectations for most of the world are the same under those two options, I should be able to calculate the difference lazily, without having to query most of my world model. Much like the message-passing update, I expect deltas to quickly fall off to zero as things propagate through the model.

Caching and Inconsistency

Here we’ll diverge somewhat from a strictly behaviorally Bayesian agent, but in a way which plays particularly well with an otherwise-Bayesian agent.

Richard Bellman popularized the idea of dynamic programming: in this context, making utility maximization calculations more efficient by precomputing and caching the instrumental values of intermediates. Insofar as we imagine our supposedly-Bayesian agent maintaining some instrumental value cache, we open the door to a certain kind of “incoherence”: the values in the cache may, for some reason, be inconsistent with either each other or the agent’s utility function. This sort of incoherence could be locally detected and fixed, by checking whether the cached values locally satisfy the Bellman equation (with the exact flavor of Bellman equation depending on what style of model we’re using for the Bayesian agent).

Similarly, we could imagine caching being useful epistemically, for efficient updates. There again, failures of cache maintenance could result in “inconsistent beliefs”.

If and when cache inconsistency is detected, the agent might require quite a bit of propagation—i.e. thinking and reflection—to sort it out.

Putting It All Together

When we picture a “Bayesian agent”, we’re typically picturing an agent with a world-model which looks basically like a moderately-sized program with a lot of recursion. That “program” represents a big causal model as a bunch of smaller submodels, which get reused and “call” each other.

Updates are performed via some sort of message-passing; we expect that the messages don’t typically need to propagate very far. Similarly, to maximize expected utility, the agent only needs to compute the difference in expected utility between options available in its current decision. As with updates, such differences are expected to typically not propagate very far.

Most of the variables in the model are latents, as opposed to variables directly representing observations or actions. Such latents don’t have to correspond to anything in the world; the fact that they usually seem to correspond to stuff in the world in some sense is an interesting empirical fact, and characterizing that “correspondence” is one big piece of the general project of natural latents. One reason such latents are important (even without bringing e.g. language into the picture) is that the inputs to the agent’s utility function are typically latents rather than observations/​actions—e.g. “I want my spouse to actually be happy, not just to look-to-me like they’re happy”.

Finally, if we want to make the model capture certain non-Bayesian human behaviors while still keeping most of the picture, we can assume that instrumental values and/​or epistemic updates are cached. This creates the possibility of cache inconsistency/​incoherence.

  1. ^

    John is clearly a complete amateur at augury, but the meaning here is hopefully still clear.