Re: problem 1: Jelly bean number estimates are just like thermometer readings, except that the reading is in someone’s head, rather than their hand. So the obvious answer is to average everyone’s initial, solitary impressions, absent reason to expect one individual or another is an above-average (or below-average) estimator.
If your friends use lopsided weighting schemes in their second answers, should you re-update? This depends a lot on your friends.
Don’t re-update from their answers if you think they don’t understand the merits of averaging; you want to weight each person’s raw impression evenly, not to overweight it based on how many others were randomly influenced by it (cf. information cascades: http://en.wikipedia.org/wiki/Information_cascade).
Do re-update if your friends understand the merits of averaging, such that their apparent over-weighting of a few peoples’ datapoints suggests they know something you don’t (e.g., perhaps your friend Julie has won past championships in jelly-bean estimation, and everyone but you knows it).
Since I know those people, I would weight their answers according to my best estimate of their skill at such tasks, and then average the whole group, including me.
Since I know those people, I would weight their answers according to my best estimate of their skill at such tasks, and then average the whole group, including me.
Doing this correctly can get pretty complicated. Basically, the more people you have, the less you should weight the low-quality estimates compared to the high-quality estimates.
For example, suppose that “good” thermometers are unbiased and “bad” thermometers are all biased in the same direction, but you don’t know which direction.
If you have one thermometer which you know is good, and one which you’re 95% sure is good, then you should weight both measurements about the same.
But if you have 10^6 thermometers which you know are good, and 10^6 which you’re 95% sure are good, then you should pretty much ignore the possibly-bad ones.
Before reading your answer: Human beings are bad at estimating volumes, as opposed to lengths. I would form my estimate by observing the apparent density of jellybean in the jar (e.g. by examining a square centimeter cross-section), observing the dimensions, and multiplying. Then, on the second stage, I would discard estimates which are radically different from mine (cutoff to be chosen based on observed distribution), and take the mean of the remaining. I would allow myself to be influenced in my choice of data to include by those whose data I was already inclined to include in my average.
After reading your answer: Should I notice an apparent and popular upweighting of certain responses such as you suggest, I would increase the weight of those in my average.
I would look for response clusters. Each participant could have a different counting method rendering different results (e.g. - estimate volumes/ count radius & height/ estimate there’s an empty cone at the top which you don’t see), and some methods could be common pitfalls. Therefore, some results—those obtained by a wrong way of counting, should be discarded, otherwise the median result would lead away from the right result. In order to decide which is the right response cluster, trying to figure out each method/mistake and determining the correct one would be useful. Of course, your method is not necessarily the right one, just because it’s yours.
Re: problem 1: Jelly bean number estimates are just like thermometer readings, except that the reading is in someone’s head, rather than their hand. So the obvious answer is to average everyone’s initial, solitary impressions, absent reason to expect one individual or another is an above-average (or below-average) estimator.
If your friends use lopsided weighting schemes in their second answers, should you re-update? This depends a lot on your friends.
Don’t re-update from their answers if you think they don’t understand the merits of averaging; you want to weight each person’s raw impression evenly, not to overweight it based on how many others were randomly influenced by it (cf. information cascades: http://en.wikipedia.org/wiki/Information_cascade).
Do re-update if your friends understand the merits of averaging, such that their apparent over-weighting of a few peoples’ datapoints suggests they know something you don’t (e.g., perhaps your friend Julie has won past championships in jelly-bean estimation, and everyone but you knows it).
Since I know those people, I would weight their answers according to my best estimate of their skill at such tasks, and then average the whole group, including me.
Doing this correctly can get pretty complicated. Basically, the more people you have, the less you should weight the low-quality estimates compared to the high-quality estimates.
For example, suppose that “good” thermometers are unbiased and “bad” thermometers are all biased in the same direction, but you don’t know which direction.
If you have one thermometer which you know is good, and one which you’re 95% sure is good, then you should weight both measurements about the same.
But if you have 10^6 thermometers which you know are good, and 10^6 which you’re 95% sure are good, then you should pretty much ignore the possibly-bad ones.
Not that it matters tremendously, but I was thinking of the jelly bean problem.
What kind of weighted average?
My math isn’t good enough to formalize it—I’d do it by feel.
Drat—likewise.
Before reading your answer: Human beings are bad at estimating volumes, as opposed to lengths. I would form my estimate by observing the apparent density of jellybean in the jar (e.g. by examining a square centimeter cross-section), observing the dimensions, and multiplying. Then, on the second stage, I would discard estimates which are radically different from mine (cutoff to be chosen based on observed distribution), and take the mean of the remaining. I would allow myself to be influenced in my choice of data to include by those whose data I was already inclined to include in my average.
After reading your answer: Should I notice an apparent and popular upweighting of certain responses such as you suggest, I would increase the weight of those in my average.
I would look for response clusters. Each participant could have a different counting method rendering different results (e.g. - estimate volumes/ count radius & height/ estimate there’s an empty cone at the top which you don’t see), and some methods could be common pitfalls. Therefore, some results—those obtained by a wrong way of counting, should be discarded, otherwise the median result would lead away from the right result. In order to decide which is the right response cluster, trying to figure out each method/mistake and determining the correct one would be useful. Of course, your method is not necessarily the right one, just because it’s yours.