I don’t think it’s useful to think about constructing priors in the abstract. If you think about concrete examples, you see lots of cases where a reasonable prior is easy to find (eg coin-tossing, and the typical breast-cancer diagnostic test example). That must leave some concrete examples where good priors are hard to find. What are they?
Depends on the context. In the general, abstract case, you end up talking about things like ignorance priors and entropy maximization. You can also have sets of priors that penalize more complex theories and reward simple ones; that turns into Solomonoff induction and Kolomogorov complexity and stuff like that when you try to formalize it.
In actual, practical cases, people usually try to answer a question that sounds a lot like “from the outside view, what would a reasonable guess be?”. The distinction between that and a semi-educated guess can be somewhat fuzzy. In practice, as long as your prior isn’t horrible and you have plenty of evidence, you’ll end up somewhere close to the right conclusion, and that’s usually good enough.
Of course, there are useful cases where it’s much easier to have a good prior. The prior on your opponent having a specific poker hand is pretty trivial to construct; one of a set of hands meeting a characteristic is a simple counting problem (or an ignorance prior plus a complicated Bayesian update, since usually “meeting a characteristic” is a synonym for “consistent with this piece of evidence”).
A better prior is a worse (but not useless) prior plus some evidence.
You construct a usable prior by making damn sure that the truth has non-exponentially-tiny probability, such that with enough evidence, you will eventually arrive at the truth.
From the inside, the best prior you could construct is your current belief dynamic (ie. including how you learn).
From the outside, the best prior is the one that puts 100% probability on the truth.
From LessWrong posts such as ‘Created in Motion’ and ‘Where Recursive Justification Hits Rock Bottom’ I’ve come to see that humans are born with priors (the post ‘inductive bias’ is also related, where an agent must have some sort of prior to be able to learn anything at all ever—a pebble has no priors, but a mind does, which means it can update on evidence. What Yudkowsky calls a ‘philosophical ghost of perfect emptiness’ is other people’s image of a mind with no prior, suddenly updating to have a map that perfectly reflects the territory. Once you have a thorough understanding of Bayes Theorem, this is blatantly impossible/incoherent).
So, we’re born with priors about the environment, and then our further experience give us new priors for our next experiences.
Of course, this is all rather abstract, and if you’d like to have a guide to actually forming priors about real life situations that you find confusing… Well, put in an edit, maybe someone can give you that :-)
I don’t have a specific situation in mind, it’s just that priors from nowhere make me twitch—I have the same reaction to the idea that mathematical axioms are arbitrary. No, they aren’t! Mathematicians have to have some way of choosing axioms which lead to interesting mathematics.
At the moment, I’m stalking the idea that priors have a hierarchy or possibly some more complex structure, and being confused means that you suspect you have to dig deep into your structure of priors. Being surprised means that your priors have been attacked on a shallow level.
What do you mean ‘priors from nowhere’? The idea that we’re just born with a prior, or people just saying ‘this is my prior, and therefore a fact’ when given some random situation (that was me paraphrasing my mum’s ‘this is my opinion, and therefore a fact’).
More like “here are the priors I’m plugging into the bright and shiny Bayes equation”, without any indication of why the priors were plausible enough to be worth bothering with.
In Bayesian statistics there’s the concept of ‘weakly informative priors’, which are priors that are quite broad and conservative, but don’t concentrate almost all of their mass on values that no one thinks are plausible. For example, if I’m estimating the effect of a drug, I might choose priors that give low mass to biologically implausible effect sizes. If it’s a weight gain drug, perhaps I’d pick a normal distribution with less than 1% probability mass for more than 100% weight increase or 50% weight decrease. Still pretty conservative, but mostly captures people’s intuitions of what answers would be crazy.
Sometimes this is pretty useful, and sometimes not. Its going to be most useful when you have not much evidence, and also when your model is not well constrained along some dimensions (such as when you have multiple sources of variance). Its also going to be useful when there are a ton of answers that seem implausible.
I don’t know if you meant “people” in a generalized sense, meaning “every rational probability user”, or more in the sense of “the common wo/men”. If in the first sense, there are different principles you can use that depends on what you already know to be true: the indifference principle, Laplace’s succession rule, minimum entropy, group invariance, Solomonoff induction, etc., and possibly even more. It should be an active area of research in probability theory (if it’s not, shame on you, researchers!). As a general principle, the ideal prior is the most inclusive prior that is not ruled out by the information (you consider true). Even after that, you want to be very careful not to let any proposition to be 0 or 1, because outside of mathematical idealization, everybody is imperfect and has access only to imperfect information. If, otherwise, you meant “the common person in the street”, then I can only say that I see used overwhelmingly the bias of authority and generalization from one example. After all, “construct prior” just means “decide what is true and to what degree”. ”Constructing better prior” amounts to not using information we don’t have, avoiding the mind projection fallacy, and using the information we have, constructing an informed model of the world. It is indeed worth trying to figure out how to be better at those things, but not as much as in idealized setting. Since we have access only to inconsistent information, it is sometimes the case that we must completely discard what we held to be true, a case that doesn’t happen in pure probability theory.
You can construct a variety of priors and then show that some of them have more intuitive implications than others. See e.g. the debate about priors in this post, and in the comment threads of the posts it follows up on.
How do people construct priors? Is it worth trying to figure out how to construct better priors?
They make stuff up, mostly, from what I see here. Some even pretend that “epsilon” is a valid prior.
Definitely. Gwern recommends the prediction book as a practice to measure and improve your calibration.
I don’t think it’s useful to think about constructing priors in the abstract. If you think about concrete examples, you see lots of cases where a reasonable prior is easy to find (eg coin-tossing, and the typical breast-cancer diagnostic test example). That must leave some concrete examples where good priors are hard to find. What are they?
I have a question that relates to this one. If I’m not good at constructing priors, is going with agnosticism/50% recommended?
Depends on the context. In the general, abstract case, you end up talking about things like ignorance priors and entropy maximization. You can also have sets of priors that penalize more complex theories and reward simple ones; that turns into Solomonoff induction and Kolomogorov complexity and stuff like that when you try to formalize it.
In actual, practical cases, people usually try to answer a question that sounds a lot like “from the outside view, what would a reasonable guess be?”. The distinction between that and a semi-educated guess can be somewhat fuzzy. In practice, as long as your prior isn’t horrible and you have plenty of evidence, you’ll end up somewhere close to the right conclusion, and that’s usually good enough.
Of course, there are useful cases where it’s much easier to have a good prior. The prior on your opponent having a specific poker hand is pretty trivial to construct; one of a set of hands meeting a characteristic is a simple counting problem (or an ignorance prior plus a complicated Bayesian update, since usually “meeting a characteristic” is a synonym for “consistent with this piece of evidence”).
A better prior is a worse (but not useless) prior plus some evidence.
You construct a usable prior by making damn sure that the truth has non-exponentially-tiny probability, such that with enough evidence, you will eventually arrive at the truth.
From the inside, the best prior you could construct is your current belief dynamic (ie. including how you learn).
From the outside, the best prior is the one that puts 100% probability on the truth.
I don’t know how much this answers your question.
From LessWrong posts such as ‘Created in Motion’ and ‘Where Recursive Justification Hits Rock Bottom’ I’ve come to see that humans are born with priors (the post ‘inductive bias’ is also related, where an agent must have some sort of prior to be able to learn anything at all ever—a pebble has no priors, but a mind does, which means it can update on evidence. What Yudkowsky calls a ‘philosophical ghost of perfect emptiness’ is other people’s image of a mind with no prior, suddenly updating to have a map that perfectly reflects the territory. Once you have a thorough understanding of Bayes Theorem, this is blatantly impossible/incoherent).
So, we’re born with priors about the environment, and then our further experience give us new priors for our next experiences.
Of course, this is all rather abstract, and if you’d like to have a guide to actually forming priors about real life situations that you find confusing… Well, put in an edit, maybe someone can give you that :-)
I don’t have a specific situation in mind, it’s just that priors from nowhere make me twitch—I have the same reaction to the idea that mathematical axioms are arbitrary. No, they aren’t! Mathematicians have to have some way of choosing axioms which lead to interesting mathematics.
At the moment, I’m stalking the idea that priors have a hierarchy or possibly some more complex structure, and being confused means that you suspect you have to dig deep into your structure of priors. Being surprised means that your priors have been attacked on a shallow level.
What do you mean ‘priors from nowhere’? The idea that we’re just born with a prior, or people just saying ‘this is my prior, and therefore a fact’ when given some random situation (that was me paraphrasing my mum’s ‘this is my opinion, and therefore a fact’).
More like “here are the priors I’m plugging into the bright and shiny Bayes equation”, without any indication of why the priors were plausible enough to be worth bothering with.
In Bayesian statistics there’s the concept of ‘weakly informative priors’, which are priors that are quite broad and conservative, but don’t concentrate almost all of their mass on values that no one thinks are plausible. For example, if I’m estimating the effect of a drug, I might choose priors that give low mass to biologically implausible effect sizes. If it’s a weight gain drug, perhaps I’d pick a normal distribution with less than 1% probability mass for more than 100% weight increase or 50% weight decrease. Still pretty conservative, but mostly captures people’s intuitions of what answers would be crazy.
Andrew Gelman has some recent discussion here.
Sometimes this is pretty useful, and sometimes not. Its going to be most useful when you have not much evidence, and also when your model is not well constrained along some dimensions (such as when you have multiple sources of variance). Its also going to be useful when there are a ton of answers that seem implausible.
The extent of my usefulness here is used.
Related Hanson paper: http://hanson.gmu.edu/prior.pdf
I don’t know if you meant “people” in a generalized sense, meaning “every rational probability user”, or more in the sense of “the common wo/men”.
If in the first sense, there are different principles you can use that depends on what you already know to be true: the indifference principle, Laplace’s succession rule, minimum entropy, group invariance, Solomonoff induction, etc., and possibly even more. It should be an active area of research in probability theory (if it’s not, shame on you, researchers!). As a general principle, the ideal prior is the most inclusive prior that is not ruled out by the information (you consider true). Even after that, you want to be very careful not to let any proposition to be 0 or 1, because outside of mathematical idealization, everybody is imperfect and has access only to imperfect information.
If, otherwise, you meant “the common person in the street”, then I can only say that I see used overwhelmingly the bias of authority and generalization from one example. After all, “construct prior” just means “decide what is true and to what degree”.
”Constructing better prior” amounts to not using information we don’t have, avoiding the mind projection fallacy, and using the information we have, constructing an informed model of the world. It is indeed worth trying to figure out how to be better at those things, but not as much as in idealized setting. Since we have access only to inconsistent information, it is sometimes the case that we must completely discard what we held to be true, a case that doesn’t happen in pure probability theory.
You can construct a variety of priors and then show that some of them have more intuitive implications than others. See e.g. the debate about priors in this post, and in the comment threads of the posts it follows up on.
The Handbook of Chemistry and Physics?
But seriously, I have no idea either, other than ‘eyeball it’, and I’d like to see how other people answer this question too.