The nice thing about using odds (or log odds) is that the normalizer cancels out when using Bayes’ rule. For a boolean query X and data D, it looks like this:
Bayes’ rule, usual form: P[X|D]=1ZP[D|X]P[X], where Z is the normalizer
Bayes’ rule, odds form: P[X|D]P[¯X|D]=ZZP[D|X]P[D|¯X]P[X]P[¯X]=P[D|X]P[D|¯X]P[X]P[¯X], where ¯X is not-X.
That’s the usual presentation. But note that it assumes X is boolean. How do we get the same benefit—i.e. a form which in which the normalizer cancels out in Bayes’ rule—for non-boolean X?
The trick is to choose a reference value of X, and compute probabilities relative to the probability of that reference value. For instance, if X is a six-sided die roll, I could choose X=5 as my reference value, and then I’d represent the distribution as (x↦P[X=x]P[X=5]). You can check that, when I update this distribution to P[X|D], and represent the updated distribution as (x↦P[X=x|D]P[X=5|D]), the normalizer cancels out just like it does for the odds form on a boolean variable.
That’s the trick used for θ in the post. Applying the trick requires picking an arbitrary value of θ to use as the reference value (like X=5 above), and that’s θ0.
The nice thing about using odds (or log odds) is that the normalizer cancels out when using Bayes’ rule. For a boolean query X and data D, it looks like this:
Bayes’ rule, usual form: P[X|D]=1ZP[D|X]P[X], where Z is the normalizer
Bayes’ rule, odds form: P[X|D]P[¯X|D]=ZZP[D|X]P[D|¯X]P[X]P[¯X]=P[D|X]P[D|¯X]P[X]P[¯X], where ¯X is not-X.
That’s the usual presentation. But note that it assumes X is boolean. How do we get the same benefit—i.e. a form which in which the normalizer cancels out in Bayes’ rule—for non-boolean X?
The trick is to choose a reference value of X, and compute probabilities relative to the probability of that reference value. For instance, if X is a six-sided die roll, I could choose X=5 as my reference value, and then I’d represent the distribution as (x↦P[X=x]P[X=5]). You can check that, when I update this distribution to P[X|D], and represent the updated distribution as (x↦P[X=x|D]P[X=5|D]), the normalizer cancels out just like it does for the odds form on a boolean variable.
That’s the trick used for θ in the post. Applying the trick requires picking an arbitrary value of θ to use as the reference value (like X=5 above), and that’s θ0.