bogdanb comments on The Weighted Majority Algorithm

bogdanb 14 Jul 2013 10:33 UTC
3 points

The number of samples required is independent of the population size. No deterministic algorithm can do this, not even if you try to simulate samples by using good PRNGs, because there are ``orderings″ of the world’s population that would fool your algorithm.

Wait a minute. There are also orderings of the world population that will randomly fool the random algorithm. (That’s where the 99% comes from.) Or, more precisely, there are samplings generated by your random algorithm that get fooled by the actual ordering of the world population.

If you write a deterministic algorithm that samples the exact number of samples as the random one does, but instead of sampling randomly you sample deterministically, spreading the samples in a way you believe to be appropriately uniform, then, assuming the human population isn’t conspiring to trick you, you should get exactly the same 99% confidence and 5% precision.

(Of course, if you screw up spreading the samples, it means you did worse than random, so there’s something wrong with your model of the world. And if the entire world conspires to mess up your estimate of red hair frequency, and succeeds to do so without you noticing and altering you sampling, you’ve got bigger problems.)

For example, once n is found (the number of tests needed), you list all humans in order of some suitable tuple of properties (e.g., full name name, date and time of birth, population of home city/town/etc, or maybe), and then pick n people equally distanced in this list. You’re still going to find the answer, with the same precision and confidence, unless the world population conspired to distribute itself over the world exactly the right way to mess up your selection. There are all sorts of other selection procedures that are deterministic (they depend only on the population), but are ridiculously unlikely to have any correlation with hair color.

Other example: order all cities/towns/etc by GDP; group them in groups of ~N/n; pick from each group the youngest person; I don’t think n is high enough, but if it happens that a group contains a single city with more than N/n population pick the two, three, etc youngest persons. If you’re concerned that people all over the world recently started planning their babies just to mess with you, then pick the median-by-age person.