The nice thing about using odds (or log odds) is that the normalizer cancels out when using Bayes’ rule. For a boolean query X and data D, it looks like this:
Bayes’ rule, usual form: P[X|D]=1ZP[D|X]P[X], where Z is the normalizer
Bayes’ rule, odds form: P[X|D]P[¯X|D]=ZZP[D|X]P[D|¯X]P[X]P[¯X]=P[D|X]P[D|¯X]P[X]P[¯X], where ¯X is not-X.
That’s the usual presentation. But note that it assumes X is boolean. How do we get the same benefit—i.e. a form which in which the normalizer cancels out in Bayes’ rule—for non-boolean X?
The trick is to choose a reference value of X, and compute probabilities relative to the probability of that reference value. For instance, if X is a six-sided die roll, I could choose X=5 as my reference value, and then I’d represent the distribution as (x↦P[X=x]P[X=5]). You can check that, when I update this distribution to P[X|D], and represent the updated distribution as (x↦P[X=x|D]P[X=5|D]), the normalizer cancels out just like it does for the odds form on a boolean variable.
That’s the trick used for θ in the post. Applying the trick requires picking an arbitrary value of θ to use as the reference value (like X=5 above), and that’s θ0.
what is an ‘arbitrary reference parameter’? This is not in my vocabulary.
(and why do we need it? can’t we just take the log here).
The nice thing about using odds (or log odds) is that the normalizer cancels out when using Bayes’ rule. For a boolean query X and data D, it looks like this:
Bayes’ rule, usual form: P[X|D]=1ZP[D|X]P[X], where Z is the normalizer
Bayes’ rule, odds form: P[X|D]P[¯X|D]=ZZP[D|X]P[D|¯X]P[X]P[¯X]=P[D|X]P[D|¯X]P[X]P[¯X], where ¯X is not-X.
That’s the usual presentation. But note that it assumes X is boolean. How do we get the same benefit—i.e. a form which in which the normalizer cancels out in Bayes’ rule—for non-boolean X?
The trick is to choose a reference value of X, and compute probabilities relative to the probability of that reference value. For instance, if X is a six-sided die roll, I could choose X=5 as my reference value, and then I’d represent the distribution as (x↦P[X=x]P[X=5]). You can check that, when I update this distribution to P[X|D], and represent the updated distribution as (x↦P[X=x|D]P[X=5|D]), the normalizer cancels out just like it does for the odds form on a boolean variable.
That’s the trick used for θ in the post. Applying the trick requires picking an arbitrary value of θ to use as the reference value (like X=5 above), and that’s θ0.