tgb comments on Wisdom of the Crowd: not always so wise

tgb 4 Jul 2012 15:21 UTC
0 points
This seems like a round-about way to describe a bell curve...

But suppose in your example that we’re only asking those silly Americans, who, like myself, have only even heard of the Battle of Bosworth as a name and really know nothing about it except maybe some English people were involved or something. And so let’s assume that people are guessing as a bell curve around 1600 with a large variance of, say, 200 years or so. If the two options are 1600 and 1200, let’s say, then 15.8% of the people will be guessing 1200 (ie. think it’s earlier than 1400) and the rest are guessing 1600. This averages out to 1536 in the limit of large numbers.

So I guess I don’t understand your point still—it’s not converging to 1600 or anything like that. It is high, but their was a systematic bias towards being high so what else would you expect? In this example (which was chosen arbitrarily) the two options gave a more correct response than the free guess. Of course, we can come up with options that would make the free response better—choosing between, say, 2600 and 1200 gives an average of 1293 .
- sixes_and_sevens 4 Jul 2012 16:12 UTC
  1 point
  Parent
  It doesn’t have to be a Gaussian distribution. We would expect it to look like one under reasonably assumed conditions, but systematic bias would skew it. A particularly large single source (say there was a Battle of Dosworth Field that happened 400 years later) could easily result in a bimodal distribution.
  
  In order for Wisdom of Crowds to work (as it’s expected to work), people aren’t guessing along a Gaussian distribution. They’re applying knowledge they have, and some of that knowledge is useful information, while some of that knowledge is noise. All the useful information pulls the mean towards the true value, while all the noise pulls it away. The difference is that the useful information converges on a single value, (because it’s a convergent problem with a single correct answer), while all the noise pulls arbitrarily in all directions.
  
  Provided there isn’t some reason for the noise itself to converge on a single value (and I think this is where my previous comments have not necessarily been clear, I’m talking about the noise converging, not the overall mean), the noise should cancel itself out.
  
  It should be obvious that if you give people a right answer and a wrong answer, the noise will be weighted in the direction of the wrong answer (because there’s no corresponding error on the other side of the true value). Even if you have two wrong answers on either side of a true value, and ask people to pick the one closest to the true value, you will still have a skew problem, because unless the two values are equidistant to the true value (which defeats the point of the question), your noise is not going to be equally distributed around the true value.