Bayes’ rule: Odds form

WikiLast edit: Oct 13, 2016, 12:56 AM by Eliezer Yudkowsky

One of the more convenient forms of Bayes’ rule uses relative odds. Bayes’ rule says that, when you observe a piece of evidence $e,$ your posterior odds $O (H ∣ e)$ for your hypothesis vector $H$ given $e$ is just your prior odds $O (H)$ on $H$ times the likelihood function $L_{e} (H) .$

For example, suppose we’re trying to solve a mysterious murder, and we start out thinking the odds of Professor Plum vs. Miss Scarlet committing the murder are 1 : 2, that is, Scarlet is twice as likely as Plum to have committed the murder a priori. We then observe that the victim was bludgeoned with a lead pipe. If we think that Plum, if he commits a murder, is around 60% likely to use a lead pipe, and that Scarlet, if she commits a murder, would be around 6% likely to us a lead pipe, this implies relative likelihoods of 10 : 1 for Plum vs. Scarlet using the pipe. The posterior odds for Plum vs. Scarlet, after observing the victim to have been murdered by a pipe, are $(1 : 2) \times (10 : 1) = (10 : 2) = (5 : 1)$ . We now think Plum is around five times as likely as Scarlet to have committed the murder.

Odds functions

Let $H$ denote a vector of hypotheses. An odds function $O$ is a function that maps $H$ to a set of odds. For example, if $H = (H_{1}, H_{2}, H_{3}),$ then $O (H)$ might be $(6 : 2 : 1),$ which says that $H_{1}$ is 3x as likely as $H_{2}$ and 6x as likely as $H_{3} .$ An odds function captures our relative probabilities between the hypotheses in $H;$ for example, (6 : 2 : 1) odds are the same as (18 : 6 : 3) odds. We don’t need to know the absolute probabilities of the $H_{i}$ in order to know the relative odds. All we require is that the relative odds are proportional to the absolute probabilities: $O (H) \propto P (H) .$

In the example with the death of Mr. Boddy, suppose $H_{1}$ denotes the proposition “Reverend Green murdered Mr. Boddy”, $H_{2}$ denotes “Mrs. White did it”, and $H_{3}$ denotes “Colonel Mustard did it”. Let $H$ be the vector $(H_{1}, H_{2}, H_{3}) .$ If these propositions respectively have prior probabilities of 80%, 8%, and 4% (the remaining 8% being reserved for other hypotheses), then $O (H) = (80 : 8 : 4) = (20 : 2 : 1)$ represents our relative credences about the murder suspects — that Reverend Green is 10 times as likely to be the murderer as Miss White, who is twice as likely to be the murderer as Colonel Mustard.

Likelihood functions

Suppose we discover that the victim was murdered by wrench. Suppose we think that Reverend Green, Mrs. White, and Colonel Mustard, if they murdered someone, would respectively be 60%, 90%, and 30% likely to use a wrench. Letting $e_{w}$ denote the observation “The victim was murdered by wrench,” we would have $P (e_{w} ∣ H) = (0.6, 0.9, 0.3) .$ This gives us a likelihood function defined as $L_{e_{w}} (H) = P (e_{w} ∣ H) .$

Bayes’ rule, odds form

Let $O (H ∣ e)$ denote the posterior odds of the hypotheses $H$ after observing evidence $e .$ Bayes’ rule then states:

$O (H) \times L_{e} (H) = O (H ∣ e)$

This says that we can multiply the relative prior credence $O (H)$ by the likelihood $L_{e} (H)$ to arrive at the relative posterior credence $O (H ∣ e) .$ Because odds are invariant under multiplication by a positive constant, it wouldn’t make any difference if the likelihood function was scaled up or down by a constant, because that would only have the effect of multiplying the final odds by a constant, which does not affect them. Thus, only the relative likelihoods are necessary to perform the calculation; the absolute likelihoods are unnecessary. Therefore, when performing the calculation, we can simplify $L_{e} (H) = (0.6, 0.9, 0.3)$ to the relative likelihoods $(2 : 3 : 1) .$

In our example, this makes the calculation quite easy. The prior odds for Green vs White vs Mustard were $(20 : 2 : 1) .$ The relative likelihoods were $(0.6 : 0.9 : 0.3)$ = $(2 : 3 : 1) .$ Thus, the relative posterior odds after observing $e_{w}$ = Mr. Boddy was killed by wrench are $(20 : 2 : 1) \times (2 : 3 : 1) = (40 : 6 : 1) .$ Given the evidence, Reverend Green is 40 times as likely as Colonel Mustard to be the killer, and ²⁰⁄₃ times as likely as Mrs. White.

Bayes’ rule states that this relative proportioning of odds among these three suspects will be correct, regardless of how our remaining 8% probability mass is assigned to all other suspects and possibilities, or indeed, how much probability mass we assigned to other suspects to begin with. For a proof, see Proof of Bayes’ rule.

Visualization

Frequency diagrams, waterfall diagrams, and spotlight diagrams may be helpful for explaining or visualizing the odds form of Bayes’ rule.

Mark Hamilton Jul 15, 2022, 2:45 AM
1 point
0
I don’t know if this is helpful or not, but, as someone who is genuinely trying to use this to learn Bayes’ theorem and doesn’t already understand it, I found the following confusing:

When you introduce P(X) you don’t explicitly show how those cash out. I eventually figured out the proper way to do it after reading the whole page, but I was a bit confused. Just something simple like “P(sick)=.2”. Maybe that seems obvious, but it wasn’t until I tried to do the example equations on my own that I realized I wasn’t actually sure how “P(X)” translated into numbers in an equation.
Conor Duggan Mar 22, 2021, 8:18 AM
1 point
0
Full what?
Philipp Cannons Jan 23, 2021, 6:01 AM
1 point
0
Why does P(Y) become P(H_j)/P(H_k)?

Does this imply Y = [H_j, H_k]?

So far P(Y) referred to P(sick) whereas now it refers to P(sick)/P(healthy).

This is confusing me.
Dewi Morgan Sep 25, 2018, 7:22 PM
3 points
0
Is this a probability or an odd? What’s a “chance”? In this list, “chance”s are expressed both as a fraction, and as a percentage, like some kind of hybrid probablodd. This feels like muddying the waters. When you’re introducing a new concept like “probability and odds, while synonyms in lay speak, are different things in statistics”, it’s probably not good to conflate both with another lay synonym like “chance”.
Dewi Morgan Sep 25, 2018, 7:10 PM
3 points
0
In this page, the terms “probability” and “odds” are used in the statistical sense of “In the classical and canonical representation of probability, 0 expresses absolute incredulity, and 1 expresses absolute credulity.” (from the linked definition) and “odds are a ratio of desired outcomes vs the field” (has no linked definition, I’m just wildly guessing based on context).

Explaining this distinction clearly at the outset for non-statistically trained users, may be worthwhile.

Explaining what is meant by odds, on this page about them, may be worthwhile.

It may also be confusing to a new reader, who has just read a linked definition which explains that probabilities are expressed as a number from 0 to 1, to see it expressed in the very same paragraph (and elsewhere on the page) as a percentage instead. I feel that this is a lack in the definition, though, rather than a problematic inconsistency on the page: I find the expression both as 0-1 and 0%-100% meant I looked at the problem from both points of view, and so felt it had a firmer grasp on the concepts.
ubs izo Aug 3, 2017, 5:03 PM
1 point
0
I’d really like to see links to problems or sums at each level, i feel like a single or two worked out examples is not enough, and that say ten problems that help one think this idea and connected ideas through would be great.
yassine chaouche Jun 11, 2017, 5:42 PM
2 points
0
Would that mean that the strength of evidence is the TP/FP ratio ? in that case, it would have the same definition as the relative likelihood. Wouldn’t there be a better definition for either one of the notions so that we can easily differentiate them ?
Haakon Borch May 30, 2017, 5:44 PM
4 points
0
It was unclear when reading this which test “this test” referred to. I ended up figuring out the false negatives and true positives of the ⁶⁰⁄₂₀ test instead of the ⁹⁰⁄₃₀ test and was subsequently confused because 1:2 != 1:7. This might be an issue with my reading comprehension, but I figured I should mention it anyway.
Stephanie Koo Mar 15, 2017, 8:26 PM
1 point
0
Just reiterating that it’s 18% of all students (sick and healthy). That’s because it’s a 90% (0.9) chance the blackened tongue depressor belonged to a sick student, out of all the sick students (20% of total student population).

Sorry if this is really obvious to others, it just took me a while.
Martin Bishop Mar 15, 2017, 8:07 AM
15 points
0
This confused me at first because I didn’t realize it was sarcasm and I thought I was missing something. “Is there any reason why distinguishing between assumption and proposition is a bad idea?”
Anareth A Nov 28, 2016, 1:04 AM
1 point
0
“got” would be clearer.
Anareth A Nov 28, 2016, 1:04 AM
1 point
0
The following would be simpler and more consistent with the beginning of the sentence: “the fraction of sick patients that got a positive result”
alexei Oct 22, 2016, 4:42 AM
1 point
0
test
Tom Voltz Jul 11, 2016, 11:24 PM
1 point
0
I seem to have broken the display by proposing an edit! The meta-level script is showing in some places. I hope that doesn’t cause unneeded headaches.

I only wanted to emphasize the difference in notation between a horizontal line (“—”, as in relative probabilities) vs. a forward slash (”/”, as in probability that something will occur). I could find no misuse of the notation when I re-read the page, but it was a bit confusing for me jumping in at the stage I did (is there an earlier page briefly defining various notation?), since I am accustomed to both these symbols meaning “divided by”, which lead me to instinctively calculate a percentage or fraction whenever I see one or the other. This could be an idiosyncrasy of an engineering workplace.
Emile Kroeger Mar 10, 2016, 5:24 AM
2 points
0
This page asks me if I learnt the concept of “Odds ratio”—but nowhere in the page does it actually explicitly talk about odds ratios, only about odds.
Eric Rogstad Feb 26, 2016, 6:57 AM
7 points
0
Which calculation?