That’s a fully general argument against statistical inferences, isn’t it? Why bother making surveys—they inform us only about opinions of the participants, giving us no knowledge about the rest of population...
Because we don’t have a maximum entropy prior. We have reason to believe that there is a correlation between people’s opinions. We also have reason to believe that there’s a correlation between correlations. For example, if we survey a bunch of people and their opinions strongly correlate, we can inference that the rest of the population also correlates with them.
For me, a maximal entropy prior is a probability distribution over some reasonable set of hypotheses, such as H(n) = “n percent of people prefer A to B”. In such case, the prior p(H(n)) is uniform over (0,100). If we know that say H(80) is true, we know that a randomly selected person is 80% likely to prefer A over B. A survey enables us to update the prior and eventually locate the correct hypothesis, whatever prior we are starting from. It doesn’t need to explicitly assume any correlation. That 80% of people share an opinion isn’t called correlation between their opinons, in the usual sense of what “correlation” means.
You seem to have somewhat different notion of maximal entropy prior. Perhaps maximal entropy distribution over all possible hypotheses? You seem to imply that with maximum entropy induction is impossible, or something along these lines. I don’t think this is the standard meaning of “maximum entropy prior”.
As I stated at the beginning, I don’t know the standard meaning of maximum entropy prior.
This time when I looked it up I found a simpler definition with finite cases. I’m not sure why I missed that before. I think I can figure out where the confusion is. I was thinking of every possible combination of opinions being separate possibilities. If this is the case, having them all be independent of each other is the maximum entropy. If, on the other hand, you only look at correlation, and consider H(80) = 50 being one case, then maximum entropy would seem to be that H(n) is uniformly distributed.
I don’t think that’s quite right either. I suspect that has something to do with H(n) being continuous instead of discrete. I know the Jeffreys prior for that is beta(1/2,1/2), as opposed to beta(1,1), which is the uniform distribution.
That’s a fully general argument against statistical inferences, isn’t it? Why bother making surveys—they inform us only about opinions of the participants, giving us no knowledge about the rest of population...
Because we don’t have a maximum entropy prior. We have reason to believe that there is a correlation between people’s opinions. We also have reason to believe that there’s a correlation between correlations. For example, if we survey a bunch of people and their opinions strongly correlate, we can inference that the rest of the population also correlates with them.
For me, a maximal entropy prior is a probability distribution over some reasonable set of hypotheses, such as H(n) = “n percent of people prefer A to B”. In such case, the prior p(H(n)) is uniform over (0,100). If we know that say H(80) is true, we know that a randomly selected person is 80% likely to prefer A over B. A survey enables us to update the prior and eventually locate the correct hypothesis, whatever prior we are starting from. It doesn’t need to explicitly assume any correlation. That 80% of people share an opinion isn’t called correlation between their opinons, in the usual sense of what “correlation” means.
You seem to have somewhat different notion of maximal entropy prior. Perhaps maximal entropy distribution over all possible hypotheses? You seem to imply that with maximum entropy induction is impossible, or something along these lines. I don’t think this is the standard meaning of “maximum entropy prior”.
As I stated at the beginning, I don’t know the standard meaning of maximum entropy prior.
This time when I looked it up I found a simpler definition with finite cases. I’m not sure why I missed that before. I think I can figure out where the confusion is. I was thinking of every possible combination of opinions being separate possibilities. If this is the case, having them all be independent of each other is the maximum entropy. If, on the other hand, you only look at correlation, and consider H(80) = 50 being one case, then maximum entropy would seem to be that H(n) is uniformly distributed.
I don’t think that’s quite right either. I suspect that has something to do with H(n) being continuous instead of discrete. I know the Jeffreys prior for that is beta(1/2,1/2), as opposed to beta(1,1), which is the uniform distribution.