Well, you can still define information entropy for probability density functions—though I suppose if we ignore Jaynes we can probably get paradoxes if we try. In fact, I’m pretty sure just integrating p*Log(p) is right. There’s also a problem if you want to have a maxent prior over the integers or over the real numbers; that takes us into the realm of improper priors.
I don’t know as much as I should about this topic, so you may have to illustrate using an example before I figure out what you mean.
Yeah I think integral( p*log(p) ) is it. The simplest problem is that if I have some parameter x to which I want to assign a prior (perhaps not over the whole real set, so it can be proper as you say—the boundaries can be part of the maxent condition set), then via the maxent method I will get a different prior depending on whether I happen to assign the distribution over x, or x^2, or log(x) etc. That is, the prior pdf obtained for one parameterisation is not related to the one obtained for a different parameterisation by the correct transformation rule for probability density functions; that is, they contain logically different information. This is upsetting if you have no reason to prefer one parameterisation or another.
In the simplest case where you have no constraints except the boundaries, and maybe expect to get a flat prior (I don’t remember if you do when there are boundaries… I think you do in 1D at least) then it is most obvious that a prior flat in x contains very different information to one flat in x^2 or log(x).
According to Jaynes, it’s actually not—I don’t have the page number on me, unfortunately. But the way he does it is by discretizing the space of possibilities, and taking the limit as the number of discrete possibilities goes to infinity. It’s not the limit of the entropy H, since that goes to infinity, it’s the limit of H—log(n). It turns out to be a little different from integrating p*Log(p).
Well, you can still define information entropy for probability density functions—though I suppose if we ignore Jaynes we can probably get paradoxes if we try. In fact, I’m pretty sure just integrating p*Log(p) is right. There’s also a problem if you want to have a maxent prior over the integers or over the real numbers; that takes us into the realm of improper priors.
I don’t know as much as I should about this topic, so you may have to illustrate using an example before I figure out what you mean.
Yeah I think integral( p*log(p) ) is it. The simplest problem is that if I have some parameter x to which I want to assign a prior (perhaps not over the whole real set, so it can be proper as you say—the boundaries can be part of the maxent condition set), then via the maxent method I will get a different prior depending on whether I happen to assign the distribution over x, or x^2, or log(x) etc. That is, the prior pdf obtained for one parameterisation is not related to the one obtained for a different parameterisation by the correct transformation rule for probability density functions; that is, they contain logically different information. This is upsetting if you have no reason to prefer one parameterisation or another.
In the simplest case where you have no constraints except the boundaries, and maybe expect to get a flat prior (I don’t remember if you do when there are boundaries… I think you do in 1D at least) then it is most obvious that a prior flat in x contains very different information to one flat in x^2 or log(x).
According to Jaynes, it’s actually not—I don’t have the page number on me, unfortunately. But the way he does it is by discretizing the space of possibilities, and taking the limit as the number of discrete possibilities goes to infinity. It’s not the limit of the entropy H, since that goes to infinity, it’s the limit of H—log(n). It turns out to be a little different from integrating p*Log(p).