(Written before and edited where marked after reading the comments.)
1. I look at the jar and estimate how many jellybabies wide and tall is the filled volume, do some mental arithmetic, and reduce the answer according to an estimate of the packing fraction, which I would expect to be the greatest source of error.
Anyone else can do the same thing, if they’re smart enough to not just pull a figure out of the air. If I think the other contestants are unlikely to be, I ignore them. It’s like the thermometer example, except that I have reason to think my thermometer is better than most other people’s.
If I think that everyone else is smart enough to make an estimate along those lines, then it is more like the original thermometer problem. But I’m not going to just take an average. If my estimate is 1000, and someone else’s is 300, that’s too big a discrepancy to explain by minor variations. It casts doubt on the assumption of identical thermometers. Assuming that I only have the other people’s estimates, and there’s no opportunity for discussion, I’ll search for reasons why we might have come up with completely different answers, but if I find no error in my own, I’ll discard all such outliers.
If I work in the confectionery trade and know the packing fraction for jellybabies, that elevates my confidence in my own estimate and again I ignore the others...unless this competition is being held at a confectionery trade show.
In general, averaging the estimates is only valid if the estimates are believed to be of similar worth. If you know that the estimates are all unbiased but with differing variances, then you can work out some optimally weighted average that puts more weight on the more accurate estimates but does not discard the less accurate ones. However, if estimates are wildly different, the assumption may be a bad one.
BTW, a real-world example is the assessment of conference papers by the programme committee. Each paper will have been refereed by, say, four members. The typical procedure is that if they all say it’s excellent, it’s accepted without discussion. Likewise, reject it if they all say it’s rubbish. For uniformly middling assessments, the question is where to set the bar for acceptance. The only papers where a real discussion is required are the ones where the referees disagree. The disagreements are resolved by sharing evidence, not by an Aumann-like compromise based on sharing posteriors.
2. Recognise that whoever is at rational fault, agreement is not possible in the current state of things. Start recording who does what housework when, then return to the matter after some suitable time, with evidence to decide the issue.
3. Case 2 was set up to be symmetrical. Case 3 is different: rationality is right and religion is wrong. I continue in that belief. How I conduct my discussions thereafter with my religious friend is a separate matter.
4. I’m not sure how much rational perfection and common knowledge to assume of Alfred and Betty in this problem, but even if I assume that they are perfect reasoners with common priors, then I can’t see my way to proving anything about the ordering of their second estimates. (Added after reading the comments: Alfred or Betty’s estimate of the probable ordering of their second estimates is a different matter.) I suppose that some version of Aumann’s theorem says that on iterating the process they must eventually converge.
If my estimate is 1000, and someone else’s is 300, that’s too big a discrepancy to explain by minor variations. It casts doubt on the assumption of identical thermometers. Assuming that I only have the other people’s estimates, and there’s no opportunity for discussion, I’ll search for reasons why we might have come up with completely different answers, but if I find no error in my own, I’ll discard all such outliers.
What if everyone else’s estimate is between 280 and 320? Do you discard your own estimate if it’s an outlier? Does the answer depend on whether you can find an error in your reasoning?
Maybe I’ve made an error no-one else made. Maybe everyone else made an error I didn’t make. (I have personally experienced this. I knew what error everyone else was making and stuck to my answer, which in the end turned out to be right.) The thing to do is to find out why the discrepancy happened; then I will know what to do about it.
In some situations this will not be possible. Then I will have to just make an optimal Bayesian calculation based on limited information, i.e. guess. But “optimal” no more implies “accurate” than “statistically significant” implies “important”.
(Written before and edited where marked after reading the comments.)
1. I look at the jar and estimate how many jellybabies wide and tall is the filled volume, do some mental arithmetic, and reduce the answer according to an estimate of the packing fraction, which I would expect to be the greatest source of error.
Anyone else can do the same thing, if they’re smart enough to not just pull a figure out of the air. If I think the other contestants are unlikely to be, I ignore them. It’s like the thermometer example, except that I have reason to think my thermometer is better than most other people’s.
If I think that everyone else is smart enough to make an estimate along those lines, then it is more like the original thermometer problem. But I’m not going to just take an average. If my estimate is 1000, and someone else’s is 300, that’s too big a discrepancy to explain by minor variations. It casts doubt on the assumption of identical thermometers. Assuming that I only have the other people’s estimates, and there’s no opportunity for discussion, I’ll search for reasons why we might have come up with completely different answers, but if I find no error in my own, I’ll discard all such outliers.
If I work in the confectionery trade and know the packing fraction for jellybabies, that elevates my confidence in my own estimate and again I ignore the others...unless this competition is being held at a confectionery trade show.
In general, averaging the estimates is only valid if the estimates are believed to be of similar worth. If you know that the estimates are all unbiased but with differing variances, then you can work out some optimally weighted average that puts more weight on the more accurate estimates but does not discard the less accurate ones. However, if estimates are wildly different, the assumption may be a bad one.
BTW, a real-world example is the assessment of conference papers by the programme committee. Each paper will have been refereed by, say, four members. The typical procedure is that if they all say it’s excellent, it’s accepted without discussion. Likewise, reject it if they all say it’s rubbish. For uniformly middling assessments, the question is where to set the bar for acceptance. The only papers where a real discussion is required are the ones where the referees disagree. The disagreements are resolved by sharing evidence, not by an Aumann-like compromise based on sharing posteriors.
2. Recognise that whoever is at rational fault, agreement is not possible in the current state of things. Start recording who does what housework when, then return to the matter after some suitable time, with evidence to decide the issue.
3. Case 2 was set up to be symmetrical. Case 3 is different: rationality is right and religion is wrong. I continue in that belief. How I conduct my discussions thereafter with my religious friend is a separate matter.
4. I’m not sure how much rational perfection and common knowledge to assume of Alfred and Betty in this problem, but even if I assume that they are perfect reasoners with common priors, then I can’t see my way to proving anything about the ordering of their second estimates. (Added after reading the comments: Alfred or Betty’s estimate of the probable ordering of their second estimates is a different matter.) I suppose that some version of Aumann’s theorem says that on iterating the process they must eventually converge.
What if everyone else’s estimate is between 280 and 320? Do you discard your own estimate if it’s an outlier? Does the answer depend on whether you can find an error in your reasoning?
Maybe I’ve made an error no-one else made. Maybe everyone else made an error I didn’t make. (I have personally experienced this. I knew what error everyone else was making and stuck to my answer, which in the end turned out to be right.) The thing to do is to find out why the discrepancy happened; then I will know what to do about it.
In some situations this will not be possible. Then I will have to just make an optimal Bayesian calculation based on limited information, i.e. guess. But “optimal” no more implies “accurate” than “statistically significant” implies “important”.