“Simply knowing the fact that the entropy is concave down tells us that to maximize entropy we should split it up as evenly as possible—each side has a 1⁄4 chance of showing.”
Ok, that’s fine for discrete events, but what about continuous ones? That is, how do I choose a prior for real-valued parameters that I want to know about? As far as I am aware, MAXENT doesn’t help me at all here, particularly as soon as I have several parameters, and no preferred parameterisation of the problem. I know Jaynes goes on about how continuous distributions make no sense unless you know the sequence whose limit you took to get there, in which case problem solved, but I have found this most unhelpful in solving real problems where I have no preference for any particular sequence, such as in models of fundamental physics.
Well, you can still define information entropy for probability density functions—though I suppose if we ignore Jaynes we can probably get paradoxes if we try. In fact, I’m pretty sure just integrating p*Log(p) is right. There’s also a problem if you want to have a maxent prior over the integers or over the real numbers; that takes us into the realm of improper priors.
I don’t know as much as I should about this topic, so you may have to illustrate using an example before I figure out what you mean.
Yeah I think integral( p*log(p) ) is it. The simplest problem is that if I have some parameter x to which I want to assign a prior (perhaps not over the whole real set, so it can be proper as you say—the boundaries can be part of the maxent condition set), then via the maxent method I will get a different prior depending on whether I happen to assign the distribution over x, or x^2, or log(x) etc. That is, the prior pdf obtained for one parameterisation is not related to the one obtained for a different parameterisation by the correct transformation rule for probability density functions; that is, they contain logically different information. This is upsetting if you have no reason to prefer one parameterisation or another.
In the simplest case where you have no constraints except the boundaries, and maybe expect to get a flat prior (I don’t remember if you do when there are boundaries… I think you do in 1D at least) then it is most obvious that a prior flat in x contains very different information to one flat in x^2 or log(x).
According to Jaynes, it’s actually not—I don’t have the page number on me, unfortunately. But the way he does it is by discretizing the space of possibilities, and taking the limit as the number of discrete possibilities goes to infinity. It’s not the limit of the entropy H, since that goes to infinity, it’s the limit of H—log(n). It turns out to be a little different from integrating p*Log(p).
Refering to this:
“Simply knowing the fact that the entropy is concave down tells us that to maximize entropy we should split it up as evenly as possible—each side has a 1⁄4 chance of showing.”
Ok, that’s fine for discrete events, but what about continuous ones? That is, how do I choose a prior for real-valued parameters that I want to know about? As far as I am aware, MAXENT doesn’t help me at all here, particularly as soon as I have several parameters, and no preferred parameterisation of the problem. I know Jaynes goes on about how continuous distributions make no sense unless you know the sequence whose limit you took to get there, in which case problem solved, but I have found this most unhelpful in solving real problems where I have no preference for any particular sequence, such as in models of fundamental physics.
Well, you can still define information entropy for probability density functions—though I suppose if we ignore Jaynes we can probably get paradoxes if we try. In fact, I’m pretty sure just integrating p*Log(p) is right. There’s also a problem if you want to have a maxent prior over the integers or over the real numbers; that takes us into the realm of improper priors.
I don’t know as much as I should about this topic, so you may have to illustrate using an example before I figure out what you mean.
Yeah I think integral( p*log(p) ) is it. The simplest problem is that if I have some parameter x to which I want to assign a prior (perhaps not over the whole real set, so it can be proper as you say—the boundaries can be part of the maxent condition set), then via the maxent method I will get a different prior depending on whether I happen to assign the distribution over x, or x^2, or log(x) etc. That is, the prior pdf obtained for one parameterisation is not related to the one obtained for a different parameterisation by the correct transformation rule for probability density functions; that is, they contain logically different information. This is upsetting if you have no reason to prefer one parameterisation or another.
In the simplest case where you have no constraints except the boundaries, and maybe expect to get a flat prior (I don’t remember if you do when there are boundaries… I think you do in 1D at least) then it is most obvious that a prior flat in x contains very different information to one flat in x^2 or log(x).
According to Jaynes, it’s actually not—I don’t have the page number on me, unfortunately. But the way he does it is by discretizing the space of possibilities, and taking the limit as the number of discrete possibilities goes to infinity. It’s not the limit of the entropy H, since that goes to infinity, it’s the limit of H—log(n). It turns out to be a little different from integrating p*Log(p).