IlyaShpitser comments on The Power of Noise

IlyaShpitser 29 Oct 2015 21:14 UTC
3 points
0
We know what property we want (that randomization will give you), good balance in relevant covariates between two groups. I can use a deterministic algorithm for this, and in fact people do, e.g. matching algorithms. Another thing people do is try all possible assignments (see: permutation tests for the null).

Discussion of AI and omniscience is a complete red herring, you don’t need that to show that you don’t need randomness for this. We aren’t randomizing for the sake of randomizing, we are doing it because we want some property that we can directly target deterministically.

I don’t think EY can possibly know enough math to make his claim go through, I think this is an “intellectual marketing” claim. People do this a lot, if we are talking about your claim, you won the game.
- PhilGoetz 31 Oct 2015 18:44 UTC
  1 point
  0
  Parent
  If you sort all the subjects on one criteria, it may be correlated in an unexpected way with another criteria you’re unaware of. Suppose you want to study whether licorice causes left-handedness in a population from Tonawanda, NY. So you get a list of addresses from Tonawanda New York, sort them by address, and go down the list throwing them alternately into control and experimental group. Then you mail the experimental group free licorice for a ten years. Voila, after 10 years there are more left-handers in the experimental group.
  
  But even and odd addresses are on opposite sides of the street. And it so happens that in Tonawanda, NY, the screen doors on the front of every house are hinged on the west side, regardless of which way the house faces, because the west wind is so strong it would rip the door off its hinges otherwise. So people on the north side of the street, who are mostly in your experimental group, open the door with their left hand, getting a lot of exercise from this (the wind is very strong), while people on the south side open the screen door with their right hand.
  
  It seems unlikely to me that many hidden correlations would survive alternating picks from a sorted list like this rigged example, but if the sample size is large enough, you’d still be better off randomizing than following any deterministic algorithm, because “every other item from a list sorted on X” has low Kolmogorov complexity and can be replicated by an unknown correlate of your observable variable by chance.