I also reran the simulations with Number=100. Fitness is lower at all values of Mutation (by about 1⁄3), but it’s still linear in 1/Mutation^2, not 1/Mutation. The relationship between Fitness and Number is not clear to me at this point. As Eliezer said, the combinatorial argument I gave isn’t really relevant.
Also, with Number=1000, Genome=1000, Mutate=0.005, Fitness stabilizes at around 947. So, extrapolating from this, when 1/Mutate^2 is much larger than Genome, which is the case for human beings, almost the entire genome can be maintained against mutations. It doesn’t look like this line of inquiry gives us a reason to believe that most of human DNA is really junk.
Elizer, I’ve noticed an apparent paradox in information theory that may or may not be related to your “disconnect between variance going as the square root of a randomized genome, and the obvious argument that eliminating half the population is only going to get you 1 bit of mathematical information.” It may be of interest in either case, so I’ll state it here.
Suppose Alice is taking a test consisting of M true/false questions. She has no clue how to answer them, so it seems that the best she can do is guess randomly and get an expected score of M/2. Fortunately her friend Bob has broken into the teacher’s office and stolen the answer key, but unfortunately he can’t send her more than 1 bit of information before the test ends. What can they do, assuming they planned ahead of time?
The naive answer would be to have Bob send Alice the answer to one of the questions, which raises the expected score to 1+(M-1)/2.
A better solution is to have Bob tell Alice whether “true” answers outnumber “false” answers. If the answers are uniformly distributed, the variance of the number of “true” answers is M/4, which means Alice can get an expected score of M/2+sqrt(M)/2 if she answers all “true” or all “false” according to what Bob tells her. So here’s the paradox: how did Alice get sqrt(M)/2 more answers correct when Bob only sent her 1 bit of information?
(What if the teacher knew this might happen and made the number of “true” answers exactly equal to the number of “false” answers? Alice and Bob should have established a common random bit string R of length M ahead of time. Then Bob can send Alice a bit indicating whether she should answer according to R or the complement of R, with the same expected outcome.)
I only ran 300 generations, but I just redid them with 5000 generations (which took a few hours), and the results aren’t much different. See plots at http://www.weidai.com/fitness/plot3.png and http://www.weidai.com/fitness/plot4.png.
I also reran the simulations with Number=100. Fitness is lower at all values of Mutation (by about 1⁄3), but it’s still linear in 1/Mutation^2, not 1/Mutation. The relationship between Fitness and Number is not clear to me at this point. As Eliezer said, the combinatorial argument I gave isn’t really relevant.
Also, with Number=1000, Genome=1000, Mutate=0.005, Fitness stabilizes at around 947. So, extrapolating from this, when 1/Mutate^2 is much larger than Genome, which is the case for human beings, almost the entire genome can be maintained against mutations. It doesn’t look like this line of inquiry gives us a reason to believe that most of human DNA is really junk.
Elizer, I’ve noticed an apparent paradox in information theory that may or may not be related to your “disconnect between variance going as the square root of a randomized genome, and the obvious argument that eliminating half the population is only going to get you 1 bit of mathematical information.” It may be of interest in either case, so I’ll state it here.
Suppose Alice is taking a test consisting of M true/false questions. She has no clue how to answer them, so it seems that the best she can do is guess randomly and get an expected score of M/2. Fortunately her friend Bob has broken into the teacher’s office and stolen the answer key, but unfortunately he can’t send her more than 1 bit of information before the test ends. What can they do, assuming they planned ahead of time?
The naive answer would be to have Bob send Alice the answer to one of the questions, which raises the expected score to 1+(M-1)/2.
A better solution is to have Bob tell Alice whether “true” answers outnumber “false” answers. If the answers are uniformly distributed, the variance of the number of “true” answers is M/4, which means Alice can get an expected score of M/2+sqrt(M)/2 if she answers all “true” or all “false” according to what Bob tells her. So here’s the paradox: how did Alice get sqrt(M)/2 more answers correct when Bob only sent her 1 bit of information?
(What if the teacher knew this might happen and made the number of “true” answers exactly equal to the number of “false” answers? Alice and Bob should have established a common random bit string R of length M ahead of time. Then Bob can send Alice a bit indicating whether she should answer according to R or the complement of R, with the same expected outcome.)