One of the more convenient forms of Bayes’ rule uses relative odds. Bayes’ rule says that, when you observe a piece of evidence your posterior odds for your hypothesis vector given is just your prior odds on times the likelihood function
For example, suppose we’re trying to solve a mysterious murder, and we start out thinking the odds of Professor Plum vs. Miss Scarlet committing the murder are 1 : 2, that is, Scarlet is twice as likely as Plum to have committed the murder a priori. We then observe that the victim was bludgeoned with a lead pipe. If we think that Plum, if he commits a murder, is around 60% likely to use a lead pipe, and that Scarlet, if she commits a murder, would be around 6% likely to us a lead pipe, this implies relative likelihoods of 10 : 1 for Plum vs. Scarlet using the pipe. The posterior odds for Plum vs. Scarlet, after observing the victim to have been murdered by a pipe, are . We now think Plum is around five times as likely as Scarlet to have committed the murder.
Odds functions
Let denote a vector of hypotheses. An odds function is a function that maps to a set of odds. For example, if then might be which says that is 3x as likely as and 6x as likely as An odds function captures our relative probabilities between the hypotheses in for example, (6 : 2 : 1) odds are the same as (18 : 6 : 3) odds. We don’t need to know the absolute probabilities of the in order to know the relative odds. All we require is that the relative odds are proportional to the absolute probabilities:
In the example with the death of Mr. Boddy, suppose denotes the proposition “Reverend Green murdered Mr. Boddy”, denotes “Mrs. White did it”, and denotes “Colonel Mustard did it”. Let be the vector If these propositions respectively have prior probabilities of 80%, 8%, and 4% (the remaining 8% being reserved for other hypotheses), then represents our relative credences about the murder suspects — that Reverend Green is 10 times as likely to be the murderer as Miss White, who is twice as likely to be the murderer as Colonel Mustard.
Likelihood functions
Suppose we discover that the victim was murdered by wrench. Suppose we think that Reverend Green, Mrs. White, and Colonel Mustard, if they murdered someone, would respectively be 60%, 90%, and 30% likely to use a wrench. Letting denote the observation “The victim was murdered by wrench,” we would have This gives us a likelihood function defined as
Bayes’ rule, odds form
Let denote the posterior odds of the hypotheses after observing evidence Bayes’ rule then states:
This says that we can multiply the relative prior credence by the likelihood to arrive at the relative posterior credence Because odds are invariant under multiplication by a positive constant, it wouldn’t make any difference if the likelihood function was scaled up or down by a constant, because that would only have the effect of multiplying the final odds by a constant, which does not affect them. Thus, only the relative likelihoods are necessary to perform the calculation; the absolute likelihoods are unnecessary. Therefore, when performing the calculation, we can simplify to the relative likelihoods
In our example, this makes the calculation quite easy. The prior odds for Green vs White vs Mustard were The relative likelihoods were = Thus, the relative posterior odds after observing = Mr. Boddy was killed by wrench are Given the evidence, Reverend Green is 40 times as likely as Colonel Mustard to be the killer, and 20⁄3 times as likely as Mrs. White.
Bayes’ rule states that this relative proportioning of odds among these three suspects will be correct, regardless of how our remaining 8% probability mass is assigned to all other suspects and possibilities, or indeed, how much probability mass we assigned to other suspects to begin with. For a proof, see Proof of Bayes’ rule.
Visualization
Frequency diagrams, waterfall diagrams, and spotlight diagrams may be helpful for explaining or visualizing the odds form of Bayes’ rule.
I don’t know if this is helpful or not, but, as someone who is genuinely trying to use this to learn Bayes’ theorem and doesn’t already understand it, I found the following confusing:
When you introduce P(X) you don’t explicitly show how those cash out. I eventually figured out the proper way to do it after reading the whole page, but I was a bit confused. Just something simple like “P(sick)=.2”. Maybe that seems obvious, but it wasn’t until I tried to do the example equations on my own that I realized I wasn’t actually sure how “P(X)” translated into numbers in an equation.
Full what?
Why does P(Y) become P(H_j)/P(H_k)?
Does this imply Y = [H_j, H_k]?
So far P(Y) referred to P(sick) whereas now it refers to P(sick)/P(healthy).
This is confusing me.
Is this a probability or an odd? What’s a “chance”? In this list, “chance”s are expressed both as a fraction, and as a percentage, like some kind of hybrid probablodd. This feels like muddying the waters. When you’re introducing a new concept like “probability and odds, while synonyms in lay speak, are different things in statistics”, it’s probably not good to conflate both with another lay synonym like “chance”.
In this page, the terms “probability” and “odds” are used in the statistical sense of “In the classical and canonical representation of probability, 0 expresses absolute incredulity, and 1 expresses absolute credulity.” (from the linked definition) and “odds are a ratio of desired outcomes vs the field” (has no linked definition, I’m just wildly guessing based on context).
Explaining this distinction clearly at the outset for non-statistically trained users, may be worthwhile.
Explaining what is meant by odds, on this page about them, may be worthwhile.
It may also be confusing to a new reader, who has just read a linked definition which explains that probabilities are expressed as a number from 0 to 1, to see it expressed in the very same paragraph (and elsewhere on the page) as a percentage instead. I feel that this is a lack in the definition, though, rather than a problematic inconsistency on the page: I find the expression both as 0-1 and 0%-100% meant I looked at the problem from both points of view, and so felt it had a firmer grasp on the concepts.
I’d really like to see links to problems or sums at each level, i feel like a single or two worked out examples is not enough, and that say ten problems that help one think this idea and connected ideas through would be great.
Would that mean that the strength of evidence is the TP/FP ratio ? in that case, it would have the same definition as the relative likelihood. Wouldn’t there be a better definition for either one of the notions so that we can easily differentiate them ?
It was unclear when reading this which test “this test” referred to. I ended up figuring out the false negatives and true positives of the 60⁄20 test instead of the 90⁄30 test and was subsequently confused because 1:2 != 1:7. This might be an issue with my reading comprehension, but I figured I should mention it anyway.
Just reiterating that it’s 18% of all students (sick and healthy). That’s because it’s a 90% (0.9) chance the blackened tongue depressor belonged to a sick student, out of all the sick students (20% of total student population).
Sorry if this is really obvious to others, it just took me a while.
This confused me at first because I didn’t realize it was sarcasm and I thought I was missing something. “Is there any reason why distinguishing between assumption and proposition is a bad idea?”
“got” would be clearer.
The following would be simpler and more consistent with the beginning of the sentence: “the fraction of sick patients that got a positive result”
test
I seem to have broken the display by proposing an edit! The meta-level script is showing in some places. I hope that doesn’t cause unneeded headaches.
I only wanted to emphasize the difference in notation between a horizontal line (“—”, as in relative probabilities) vs. a forward slash (”/”, as in probability that something will occur). I could find no misuse of the notation when I re-read the page, but it was a bit confusing for me jumping in at the stage I did (is there an earlier page briefly defining various notation?), since I am accustomed to both these symbols meaning “divided by”, which lead me to instinctively calculate a percentage or fraction whenever I see one or the other. This could be an idiosyncrasy of an engineering workplace.
This page asks me if I learnt the concept of “Odds ratio”—but nowhere in the page does it actually explicitly talk about odds ratios, only about odds.
Which calculation?