What are the straightforward and the alternative explanations (hypotheses) for what I’m seeing?
How much more likely is one compared to the other a priori (when ignoring what I’m seeing)?
What probabilities do they assign to what I’m seeing?
and get the ratio of a posteriori probabilities of the hypotheses (a posteriori odds) from that (by multiplying the a priori odds by the likelihood ratio). Odds measure the relative strengths of hypotheses, so the move is to obtain relative strengths of a pair of hypotheses after observing a piece of data, with the choice of hypotheses inspired by the same piece of data. This is a very easy computation that becomes a habit and routinely fixes intuitive misestimates. Usually it’s about explanation of data/claim as correct vs. constructed by a specific sloppy process that doesn’t ensure correctness.
That is, for D the data/claim you are observing, and x and y hypotheses chosen as possible explanations for D,
P(x|D)P(y|D)=P(x)P(y)×P(D|x)P(D|y). This holds for any choice of two hypotheses x and y, which don’t have to be mutually exclusive or exhaust all possibilities, and there may be many other plausible hypotheses.
This formula is not Bayes’ Theorem, but it is a similar simple formula from probability theory, so I’m still interested in how you can use it in daily life.
Writing P(x|D) implies that x and D are the same kind of object (data about some physical process?) and there are probably a lot of subtle problems in defining hypothesis as a “set of things that happen if it is true” (especially if you want to have hypotheses that involve probabilities).
Use of this formula allows you to update probabilities you prescribe to hypotheses, but it is not obvious that update will make them better. I mean, you obviously don’t know real P(x)/P(y), so you’ll input incorrect value and get incorrect answer. But it will sometimes be less incorrect. If this algorithm has some nice properties like “sequence of P(x)/P(y) you get repeating your experiment converges to the real P(x)/P(y) provided x and y are falsifiable by your experiment (or something like that)”, then by using this algorithm you’ll with high probability eventually update your algorithm. It would be nice to understand, for what kinds of x, y and D you should be at least 90% sure that your P(x)/P(y) will be more correct after a million of experiments.
I’m not implying that this algorithm doesn’t work. More like it seems that proving that it works is beyond me. Mostly because statistics is one of the more glaring holes in my mathematical education. I hope that somebody has proved that it works at least in the cases you are likely to encounter in your daily life. Maybe it is even a well-known result.
Speaking of the daily life, can you tell me how people (and you specifically) actually apply this algorithm? How do you decide, in which situation it is worth to use it? How do you choose initial values of P(x) (e.g. it is hard for me to translate “x is probably true” into “I am 73% sure that x is true”). Are there some other important questions I should be asking about it?
The above formula is usually called “odds form of Bayes formula”. We get the standard form P(x|D)=P(x)×P(D|x)P(D) by letting y=D in the odds form, and we get the odds form from the standard form by dividing it by itself for two hypotheses (P(D) cancels out).
The serious problem with the standard form of Bayes is the P(D) term, which is usually hard to estimate (as we don’t get to choose what D is). We can try to get rid of it by expanding P(D)=P(D|x)P(x)+P(D|¬x)P(¬x), but that’s also no good, because now we need to know P(D|¬x). One way to state the problem with this is to say that a hypothesis for given observations is a description of a situation that makes it possible to estimate the probability of those observations. That is, x is a hypothesis for D if it’s possible to get a good estimate of P(D|x). To evaluate an observation, we should look for hypotheses that let us estimate that conditional probability; we do get to choose what to use as hypotheses. So the problem here is that if x is a hypothesis for D, it doesn’t follow that ¬x is a hypothesis for D or for anything else of interest. The negation of a hypothesis is not necessarily a hypothesis. That is why it defeats some of the purpose of moving over to using the odds form of Bayes if we let y=¬x, as it’s sometimes written.
In daily life, the basic move is to ask,
What are the straightforward and the alternative explanations (hypotheses) for what I’m seeing?
How much more likely is one compared to the other a priori (when ignoring what I’m seeing)?
What probabilities do they assign to what I’m seeing?
and get the ratio of a posteriori probabilities of the hypotheses (a posteriori odds) from that (by multiplying the a priori odds by the likelihood ratio). Odds measure the relative strengths of hypotheses, so the move is to obtain relative strengths of a pair of hypotheses after observing a piece of data, with the choice of hypotheses inspired by the same piece of data. This is a very easy computation that becomes a habit and routinely fixes intuitive misestimates. Usually it’s about explanation of data/claim as correct vs. constructed by a specific sloppy process that doesn’t ensure correctness.
That is, for D the data/claim you are observing, and x and y hypotheses chosen as possible explanations for D, P(x|D)P(y|D)=P(x)P(y)×P(D|x)P(D|y). This holds for any choice of two hypotheses x and y, which don’t have to be mutually exclusive or exhaust all possibilities, and there may be many other plausible hypotheses.
This formula is not Bayes’ Theorem, but it is a similar simple formula from probability theory, so I’m still interested in how you can use it in daily life.
Writing P(x|D) implies that x and D are the same kind of object (data about some physical process?) and there are probably a lot of subtle problems in defining hypothesis as a “set of things that happen if it is true” (especially if you want to have hypotheses that involve probabilities).
Use of this formula allows you to update probabilities you prescribe to hypotheses, but it is not obvious that update will make them better. I mean, you obviously don’t know real P(x)/P(y), so you’ll input incorrect value and get incorrect answer. But it will sometimes be less incorrect. If this algorithm has some nice properties like “sequence of P(x)/P(y) you get repeating your experiment converges to the real P(x)/P(y) provided x and y are falsifiable by your experiment (or something like that)”, then by using this algorithm you’ll with high probability eventually update your algorithm. It would be nice to understand, for what kinds of x, y and D you should be at least 90% sure that your P(x)/P(y) will be more correct after a million of experiments.
I’m not implying that this algorithm doesn’t work. More like it seems that proving that it works is beyond me. Mostly because statistics is one of the more glaring holes in my mathematical education. I hope that somebody has proved that it works at least in the cases you are likely to encounter in your daily life. Maybe it is even a well-known result.
Speaking of the daily life, can you tell me how people (and you specifically) actually apply this algorithm? How do you decide, in which situation it is worth to use it? How do you choose initial values of P(x) (e.g. it is hard for me to translate “x is probably true” into “I am 73% sure that x is true”). Are there some other important questions I should be asking about it?
The above formula is usually called “odds form of Bayes formula”. We get the standard form P(x|D)=P(x)×P(D|x)P(D) by letting y=D in the odds form, and we get the odds form from the standard form by dividing it by itself for two hypotheses (P(D) cancels out).
The serious problem with the standard form of Bayes is the P(D) term, which is usually hard to estimate (as we don’t get to choose what D is). We can try to get rid of it by expanding P(D)=P(D|x)P(x)+P(D|¬x)P(¬x), but that’s also no good, because now we need to know P(D|¬x). One way to state the problem with this is to say that a hypothesis for given observations is a description of a situation that makes it possible to estimate the probability of those observations. That is, x is a hypothesis for D if it’s possible to get a good estimate of P(D|x). To evaluate an observation, we should look for hypotheses that let us estimate that conditional probability; we do get to choose what to use as hypotheses. So the problem here is that if x is a hypothesis for D, it doesn’t follow that ¬x is a hypothesis for D or for anything else of interest. The negation of a hypothesis is not necessarily a hypothesis. That is why it defeats some of the purpose of moving over to using the odds form of Bayes if we let y=¬x, as it’s sometimes written.
Here’s an example of applying the formula (to a puzzle).