If your prior is that there’s no correlation between Alice and Bob, your knowledge about Alice won’t change your knowledge about Bob. If there’s no prior correlation between a correlation between Alice and Bob and one between Charlie and the average of Alice and Bob, the finding out that there’s a correlation between Alice and Bob won’t tell you that Charlie is also correlated.
Basically, I think the maximum entropy is that all of them gave their answers at random.
I don’t understand what Alice et al. are analogous to.
The described experimental setting has 21 ordered pairs of objects, each of them having a preference strength S defined as proportion of people who prefer the first object to the second one.
Either we begin with maxent prior, that is p(S) uniform on interval (0,1) for each pair, and each observation updates only the corresponding pair. Or, if we want to be fully general, we could work with a 21-dimensional distribution p(S1,S2,...,S21); begin again with uniform maxent prior and update accordingly. Given the restricted set of observations (only pairwise preference expressions) both ways are equivalent: the distribution p(S1,...,S21) will always remain separable, equal to p1(S1)p2(S2)...p21(S21). In this sense there is no revealed correlation: knowing that a person prefers object O3 to O6 doesn’t tell us anything about probability of her preference of O1 over O7. However, post-experiment there will still be a non-maxent posterior for preference between O1 and O7. The sentences
I think perfect entropy assumes no correlation between what’s preferred. As such, it would always be impossible to predict.
as I have interpreted them (i.e. “with maxent prior, we never learn anything about the preferences”) are therefore false.
That’s a fully general argument against statistical inferences, isn’t it? Why bother making surveys—they inform us only about opinions of the participants, giving us no knowledge about the rest of population...
Because we don’t have a maximum entropy prior. We have reason to believe that there is a correlation between people’s opinions. We also have reason to believe that there’s a correlation between correlations. For example, if we survey a bunch of people and their opinions strongly correlate, we can inference that the rest of the population also correlates with them.
For me, a maximal entropy prior is a probability distribution over some reasonable set of hypotheses, such as H(n) = “n percent of people prefer A to B”. In such case, the prior p(H(n)) is uniform over (0,100). If we know that say H(80) is true, we know that a randomly selected person is 80% likely to prefer A over B. A survey enables us to update the prior and eventually locate the correct hypothesis, whatever prior we are starting from. It doesn’t need to explicitly assume any correlation. That 80% of people share an opinion isn’t called correlation between their opinons, in the usual sense of what “correlation” means.
You seem to have somewhat different notion of maximal entropy prior. Perhaps maximal entropy distribution over all possible hypotheses? You seem to imply that with maximum entropy induction is impossible, or something along these lines. I don’t think this is the standard meaning of “maximum entropy prior”.
As I stated at the beginning, I don’t know the standard meaning of maximum entropy prior.
This time when I looked it up I found a simpler definition with finite cases. I’m not sure why I missed that before. I think I can figure out where the confusion is. I was thinking of every possible combination of opinions being separate possibilities. If this is the case, having them all be independent of each other is the maximum entropy. If, on the other hand, you only look at correlation, and consider H(80) = 50 being one case, then maximum entropy would seem to be that H(n) is uniformly distributed.
I don’t think that’s quite right either. I suspect that has something to do with H(n) being continuous instead of discrete. I know the Jeffreys prior for that is beta(1/2,1/2), as opposed to beta(1,1), which is the uniform distribution.
I think perfect entropy assumes no correlation between what’s preferred. As such, it would always be impossible to predict.
We’d need some prior for how much correlation there is.
I could be wrong. I don’t know much about perfect entropy.
The posterior distribution is not assumed to be perfect entropy, of course.
If your prior is that there’s no correlation between Alice and Bob, your knowledge about Alice won’t change your knowledge about Bob. If there’s no prior correlation between a correlation between Alice and Bob and one between Charlie and the average of Alice and Bob, the finding out that there’s a correlation between Alice and Bob won’t tell you that Charlie is also correlated.
Basically, I think the maximum entropy is that all of them gave their answers at random.
I don’t understand what Alice et al. are analogous to.
The described experimental setting has 21 ordered pairs of objects, each of them having a preference strength S defined as proportion of people who prefer the first object to the second one.
Either we begin with maxent prior, that is p(S) uniform on interval (0,1) for each pair, and each observation updates only the corresponding pair. Or, if we want to be fully general, we could work with a 21-dimensional distribution p(S1,S2,...,S21); begin again with uniform maxent prior and update accordingly. Given the restricted set of observations (only pairwise preference expressions) both ways are equivalent: the distribution p(S1,...,S21) will always remain separable, equal to p1(S1)p2(S2)...p21(S21). In this sense there is no revealed correlation: knowing that a person prefers object O3 to O6 doesn’t tell us anything about probability of her preference of O1 over O7. However, post-experiment there will still be a non-maxent posterior for preference between O1 and O7. The sentences
as I have interpreted them (i.e. “with maxent prior, we never learn anything about the preferences”) are therefore false.
We will know with certainty the preferences of the people asked. We will have no knowledge of the preferences of people we didn’t.
That’s a fully general argument against statistical inferences, isn’t it? Why bother making surveys—they inform us only about opinions of the participants, giving us no knowledge about the rest of population...
Because we don’t have a maximum entropy prior. We have reason to believe that there is a correlation between people’s opinions. We also have reason to believe that there’s a correlation between correlations. For example, if we survey a bunch of people and their opinions strongly correlate, we can inference that the rest of the population also correlates with them.
For me, a maximal entropy prior is a probability distribution over some reasonable set of hypotheses, such as H(n) = “n percent of people prefer A to B”. In such case, the prior p(H(n)) is uniform over (0,100). If we know that say H(80) is true, we know that a randomly selected person is 80% likely to prefer A over B. A survey enables us to update the prior and eventually locate the correct hypothesis, whatever prior we are starting from. It doesn’t need to explicitly assume any correlation. That 80% of people share an opinion isn’t called correlation between their opinons, in the usual sense of what “correlation” means.
You seem to have somewhat different notion of maximal entropy prior. Perhaps maximal entropy distribution over all possible hypotheses? You seem to imply that with maximum entropy induction is impossible, or something along these lines. I don’t think this is the standard meaning of “maximum entropy prior”.
As I stated at the beginning, I don’t know the standard meaning of maximum entropy prior.
This time when I looked it up I found a simpler definition with finite cases. I’m not sure why I missed that before. I think I can figure out where the confusion is. I was thinking of every possible combination of opinions being separate possibilities. If this is the case, having them all be independent of each other is the maximum entropy. If, on the other hand, you only look at correlation, and consider H(80) = 50 being one case, then maximum entropy would seem to be that H(n) is uniformly distributed.
I don’t think that’s quite right either. I suspect that has something to do with H(n) being continuous instead of discrete. I know the Jeffreys prior for that is beta(1/2,1/2), as opposed to beta(1,1), which is the uniform distribution.