OK, I posted the following update to my blog entry:
Rereading the last few paragraphs of Eliezer’s post, I see that he actually argues for his central claim—that the human genome can’t contain more than 25MB of “meaningful DNA”—on different (and much stronger) grounds than I thought! My apologies for not reading more carefully.
In particular, the argument has nothing to do with the number of generations since the dawn of time, and instead deals with the maximum number of DNA bases that can be simultaneously protected, in steady state, against copying errors. According to Eliezer, copying DNA sequence involves a ~10^-8 probability of error per base pair, which — because only O(1) errors per generation can be corrected by natural selection — yields an upper bound of ~10^8 on the number of “meaningful” base pairs in a given genome.
However, while this argument is much better than my straw-man based on the number of generations, there’s still an interesting loophole. Even with a 10^-8 chance of copying errors, one could imagine a genome reliably encoding far more than 10^8 bits (in fact, arbitrarily many bits) by using an error-correcting code. I’m not talking about the “local” error-correction mechanisms that we know DNA has, but about something more global—by which, say, copying errors in any small set of genes could be completely compensated by other genes. The interesting question is whether natural selection could read the syndrome of such a code, and then correct it, using O(1) randomly-chosen insertions, deletions, transpositions, and reversals. I admit that this seems unlikely, and that even if it’s possible in principle, it’s probably irrelevant to real biology. For apparently there are examples where changing even a single base pair leads to horrible mutations. And on top of that, we can’t have the error-correcting code be too good, since otherwise we’ll suppress beneficial mutations!
Incidentally, Eliezer’s argument makes the falsifiable prediction that we shouldn’t find any organism, anywhere in nature, with more than 25MB of functional DNA. Does anyone know of a candidate counterexample? (I know there are organisms with far more than humans’ 3 billion base pairs, but I have no idea how many of the base pairs are functional.)
Lastly, in spite of everything above, I’d still like a solution to my “pseudorandom DNA sequence” problem. For if the answer were negative—if given any DNA sequence, one could efficiently reconstruct a nearly-optimal sequence of insertions, transpositions, etc. producing it—then even my original straw-man misconstrual of Eliezer’s argument could put up a decent fight! :-)
OK, I posted the following update to my blog entry:
Rereading the last few paragraphs of Eliezer’s post, I see that he actually argues for his central claim—that the human genome can’t contain more than 25MB of “meaningful DNA”—on different (and much stronger) grounds than I thought! My apologies for not reading more carefully.
In particular, the argument has nothing to do with the number of generations since the dawn of time, and instead deals with the maximum number of DNA bases that can be simultaneously protected, in steady state, against copying errors. According to Eliezer, copying DNA sequence involves a ~10^-8 probability of error per base pair, which — because only O(1) errors per generation can be corrected by natural selection — yields an upper bound of ~10^8 on the number of “meaningful” base pairs in a given genome.
However, while this argument is much better than my straw-man based on the number of generations, there’s still an interesting loophole. Even with a 10^-8 chance of copying errors, one could imagine a genome reliably encoding far more than 10^8 bits (in fact, arbitrarily many bits) by using an error-correcting code. I’m not talking about the “local” error-correction mechanisms that we know DNA has, but about something more global—by which, say, copying errors in any small set of genes could be completely compensated by other genes. The interesting question is whether natural selection could read the syndrome of such a code, and then correct it, using O(1) randomly-chosen insertions, deletions, transpositions, and reversals. I admit that this seems unlikely, and that even if it’s possible in principle, it’s probably irrelevant to real biology. For apparently there are examples where changing even a single base pair leads to horrible mutations. And on top of that, we can’t have the error-correcting code be too good, since otherwise we’ll suppress beneficial mutations!
Incidentally, Eliezer’s argument makes the falsifiable prediction that we shouldn’t find any organism, anywhere in nature, with more than 25MB of functional DNA. Does anyone know of a candidate counterexample? (I know there are organisms with far more than humans’ 3 billion base pairs, but I have no idea how many of the base pairs are functional.)
Lastly, in spite of everything above, I’d still like a solution to my “pseudorandom DNA sequence” problem. For if the answer were negative—if given any DNA sequence, one could efficiently reconstruct a nearly-optimal sequence of insertions, transpositions, etc. producing it—then even my original straw-man misconstrual of Eliezer’s argument could put up a decent fight! :-)