Bayesian Inference with Partially Specified Priors
In this post, I am going to write about the practical difficulty of specifying priors that reflect internal beliefs, and about the possibility of performing probabilistic inference with partially specified or imprecise priors. For the most part, I will try to give the general intuition and then link to some relevant scholarship. I will quickly review Bayes rule and probabilistic inference to start, but this post is primarily intended for readers that are already familiar with these concepts (you can read introductions in several places).
Bayes rule specifies how to compute the probability of a hypothesis, , given some data, :
Where is called the posterior, is the prior, and is the likelihood. If we assume rational people should update their beliefs in a way that is consistent with the laws of probability, then rational people should update their beliefs using Bayes rule. It follows from this view that Bayes rule is not just a mathematical observation, it is prescriptive in the sense that it describes how people (or algorithms, AIs, or other agents that aspire to be rational) should update their beliefs. In that case, the prior can be understood as stating how probable you think the hypothesis is given all the data that is available to you, except for , and the posterior describes how probable you ought to think the hypothesis is after accounting for .
A basic understanding of what all this means is that you should account for your prior belief when considering new evidence. For example, say you get tested for the presence of a particular genetic sequence in your DNA, and that the test is 99% accurate (meaning it has a 99% true positive rate and true negative rate) and returns positive. If your prior belief that you have the genetic sequence is very low (maybe you happen to know the base rate of the sequence in the general population is less than 0.001%), then you should conclude the test was most likely a false positive.
This is all well and good in theory, but if we try to explicitly use Bayes rule in practice we might find specifying our priors to be very difficult. Even worse, it can feel arbitrary or a matter of guesswork. Consider the previous example, only now imagine that you do not have convenient knowledge about the genetic sequence’s base rate. In theory, a Bayesian would say, there is still some precise numeric value that quantifies your prior belief. Presumably it would be based on your knowledge of genes, the kinds of things people develop tests for, possibly some general belief about common base rates, and so on. In practice, could you actually introspect in a way that would allow you to write down such a prior? Trying to specify your prior to even a single decimal place, let alone several, would probably feel like you are just making stuff up. Plus, there is a concerning possibility that your answer would change significantly if you were tired, or you had been primed by seeing some random number, or by some other arbitrary reason.
This is likely true in day-to-day life. I certainly couldn’t give an exact answer to questions like “will it rain today” or “would anyone notice if I bailed and left work early”, even though these are the kind of beliefs I might want to update as I go about my day. I doubt I could assign probabilities to these statements with more than one significant digit, and even then I would be pretty leery about being asked to do so. In short, it seems unlikely that brains encode beliefs in ways that can be readily mapped to numerical values.
Given this limitation, how exactly are humans supposed to go about doing Bayesian reasoning? One suggestion might be that we should just take our best guess and treat the result as an approximation, but its hard not to wince at the idea our supposedly rational decision making procedure now requires producing a number in a way that feels so unintuitive and arbitrary. It might also be tempting to suggest people ought to just look up base rates or other statistics online, and use those as priors. However such information might not be available, and this is also, in a sense, wrong in principle. In the setting being discussed here the prior is supposed to reflect your internal beliefs, so if we are determined to do things properly there is no avoiding the difficultly of trying to write those beliefs down.
A possible workaround is to use partially specified or imprecise priors. While most discussions of Bayes rule focus on applying it when we have exact knowledge of the prior, it’s perfectly valid to apply Bayes rule when we only have partial knowledge. In this case, we may not be able to compute an exact posterior, but we might be able to get enough information about the posterior to make decisions. Mathematically, an analogy would the equation “a + b = c” when given “b = 10″ and “a > 0”. We don’t know the exact value of “c”, but we do know “c > 10”. Likewise, given partial knowledge of the prior, and knowledge about the likelihood, we can gain partial knowledge of the posterior. I think this kind of process is often more intuitive than trying to utilize an exact prior.
To give an example, imagine you eat a berry from an unknown plant, and then in a moment of paranoia you run a toxicity test with 99.9% accuracy on that plant, which returns positive. Being a committed Bayesian, you set out to write down your prior belief of the plant being toxic. This will, you reason, then allow you to compute the mathematically correct posterior belief of whether you were poisoned, which you can then use to decide whether or not to seek medical attention. Unfortunately, you become paralyzed by indecision about whether your prior is closer to “2.9%” or “2.8%”. As time passes you start to feel nauseous...
It is pretty clear why you are being foolish. As long as your prior belief is not extremely small (for example, 1e-20), then the posterior probability of the plant being poisonous will be non-trivial and any reasonable utility calculation would tell you to seek medical attention. Partial knowledge of our prior, in this case that it is not extremely small, meaning that you think it is at least somewhat plausible the plant was poisonous, is all you needed to make a decision. This kind of reasoning feels much more intuitive because, while stating a precise prior might be difficult, asserting something like “my prior is larger than 0.1%” is pretty easy. If you feel like doing math, it is possible to specify the conditions under which we can make decisions with imprecise knowledge about probabilities.
I suspect this kind of procedure, in particular the tendency to rule out extreme priors while not having a truly precise prior in mind, is pretty typical when people do probabilistic inference. To pick an example with more complex hypotheses, your prior belief in the probability it will rain today might be hard to specify, but you might at least know you are uncertain, meaning its close to uniform over the possible values of . Formally, we might say that our prior belief follows a Beta distribution and and are close to 1. This would be too vague for us to compute an exact posterior given new evidence, but it is sufficient to justify claiming our prior will be largely “washed out” by strong evidence, and thus we might be able to get a pretty clear picture of the posterior anyway. There is, admittedly, not much practical difference between framing things this way compared to just saying “if you aren’t sure what prior to use just pick a uniform one”, but it can at least provide an answer to people who have qualms about how hand-wavy the latter sounds.
In short, we might not have precise priors, but we can still account for our imprecise priors in mathematically rigorous ways. This would reflect the intuition that the rules of probability ought to be followed, but our internal beliefs might be better thought of as loose assumptions about probability distributions rather than precisely defined probability distributions. This idea has connections to Bayesian sensitivity analysis and imprecise probabilities.
Another idea is to do whatever algebra you need to do which is decision-relevant, leaving your priors unspecified, then for each decision you have to make, backpropagate that algebra to a simple and relatively easy-to-answer question about your priors (“Is my prior belief that the plant is toxic greater than 1e-20? If yes, go to the hospital.”)
Anyway, I’m glad to see this kind of post on LW.
That makes a lot of sense, but it does require you to know your utility function ahead of time. When this is not the case we might still want to propagate whatever you know about the prior forward to the posterior as a kind of caching operation for use in future decisions.
You don’t necessarily need to know your utility function, just the utility of what is being decided between or a preference ordering may do. (Solving/making progress on a specific problem may be easier than working on an abstract problem.)
Is the described process different from Dempster-Shafer ?