Basically, this shows that every term in a standard Bayesian inference, including the prior ratio, can be re-cast as a likelihood term in a setting where you start off unsure about what words mean, and have a flat prior over which set of words is true.
If the possible meanings of your words are a continuous one-dimensional variable x, a flat prior over x will not be a flat prior if you change variables to y = f(y) for an arbitrary bijection f, and the construction would be sneaking in a specific choice of function f.
Say the words are utterances about the probability of a coin falling heads, why should the flat prior be over the probability p, instead of over the log-odds log(p/(1-p)) ?
In my post, I didn’t require the distribution over meanings of words to be uniform. It could be any distribution you wanted—it just resulted in the prior ratio of “which utterance is true” being 1:1.
Why wouldn’t this construction work over a continuous space?
If the possible meanings of your words are a continuous one-dimensional variable x, a flat prior over x will not be a flat prior if you change variables to y = f(y) for an arbitrary bijection f, and the construction would be sneaking in a specific choice of function f.
Say the words are utterances about the probability of a coin falling heads, why should the flat prior be over the probability p, instead of over the log-odds log(p/(1-p)) ?
In my post, I didn’t require the distribution over meanings of words to be uniform. It could be any distribution you wanted—it just resulted in the prior ratio of “which utterance is true” being 1:1.