There’s about 4-5 million letters in the genome where at least one percent of humans have a different letter at that location. That’s compared to 3 billion letters overall.
Another way to look at genetic differences is to pick a random pair of humans and ask how much they are likely to differ. The answer is by about 3 million base pairs.
Ok. I guess that, for two random humans, you expect almost all 20000 genes to differ at least on a letter, right?
Given how little they expect to improve the predictors power by increasing the sample size, one can infer that these interactions, if they exist (and they surely do to some extent), just don’t explain very much of the variance.
Ok, but this shows that your models do not see the non-additive effects, not that there aren’t any. I don’t know exactly how analyses are done, but assuming they look at interactions with a model like y=β0+β1x1+β2x2+β12x1x2, then they would not pick up the α term in my example because of the hash (the “hash” stands for any very granular and nonlinear function).
But actually I think that it would be very weird to have such “stenographic” interactions only, without also simpler ones, so I’m satisfied with your answer.
Many of the differences between human genomes are actually in “promoter” regions. For a gene to be synthesized into a protein a little enzyme has to come over and bind to a spot next to the gene and transcribe the sequence into mRNA.
Other differences are in regions that don’t seem to affect traits at all. There’s a lot of leftover DNA in our genomes from endoviruses, transposons and other events in our evolutionary history. Sometimes the DNA in those regions randomly mutates into something useful and evolution will start acting on it.
Ok. I guess that, for two random humans, you expect almost all 20000 genes to differ at least on a letter, right?
Ok, but this shows that your models do not see the non-additive effects, not that there aren’t any. I don’t know exactly how analyses are done, but assuming they look at interactions with a model like y=β0+β1x1+β2x2+β12x1x2, then they would not pick up the α term in my example because of the hash (the “hash” stands for any very granular and nonlinear function).
But actually I think that it would be very weird to have such “stenographic” interactions only, without also simpler ones, so I’m satisfied with your answer.
Many of the differences between human genomes are actually in “promoter” regions. For a gene to be synthesized into a protein a little enzyme has to come over and bind to a spot next to the gene and transcribe the sequence into mRNA.
Other differences are in regions that don’t seem to affect traits at all. There’s a lot of leftover DNA in our genomes from endoviruses, transposons and other events in our evolutionary history. Sometimes the DNA in those regions randomly mutates into something useful and evolution will start acting on it.