Another crucial insight from these studies is that nearly all of the genetic differences between humans can be explained by additive effects; there are very few gene-gene interactions going on; If gene A makes you taller, it doesn’t depend on gene B being present to work its magic. It’s a strong, independent gene that don’t need no help.
This fact is extremely important because it makes both evolution and embryo selection possible. There is a common misconception that genes are tied together in a hopelessly complex web and that if we mess with one part of it the whole thing will come crashing down. While that may be true for genes that are universally present in the human population, it is very rarely true for genes that commonly vary between people.
As a newbie to this intriguing topic, I have various questions:
How many genes out of 20000 commonly vary between humans?
Do more complex traits like intelligence have more gene-gene interactions?
Of the total variance, do you know what’s the maximum you could explain with genes?
Assuming the polygenic scores are not close to the maximum explainable variance: how do you know that there’s not a “complex web” on top of some additive effects? Consider the following toy model: y=β0+β1x1+β2x2+αhash(x1x2). Given data generated from it, you could infer β but the hash would fall in the error variance. Even though evolution may enforce simple effects for currently varying genes, a complex web could appear on top of fixed genes; and then I’d expect a continuum of interactivity from old, fixed genes (strong interactions, “complex web”) to new variants (additive effects).
How many genes out of 20000 commonly vary between humans?
There’s about 4-5 million letters in the genome where at least one percent of humans have a different letter at that location. That’s compared to 3 billion letters overall.
Another way to look at genetic differences is to pick a random pair of humans and ask how much they are likely to differ. The answer is by about 3 million base pairs.
Do more complex traits like intelligence have more gene-gene interactions?
That’s not my impression from reading the literature. There was some giant analysis of educational attainment done last year which found literally zero gene-gene interactions. But I’m not a deep expert on this subject.
Of the total variance, do you know what’s the maximum you could explain with genes?
For intelligence? You can probably get to 1/3rd of variance explained just using SNP arrays like they collect for 23&Me. With whole genome sequencing and more samples you could probably get up to 45%, maybe higher.
Assuming the polygenic scores are not close to the maximum explainable variance: how do you know that there’s not a “complex web” on top of some additive effects?
This is not a theoretical assertion but an empirical one. We have studies on educational attainment with like 3 million participants now that have shown ZERO gene-gene interactions. They definitely exist, (at least for other traits) but according to the authors I guess you need an even larger sample size to identify them. Given how little they expect to improve the predictors power by increasing the sample size, one can infer that these interactions, if they exist (and they surely do to some extent), just don’t explain very much of the variance. (Ctrl+F for “epistatic interactions” in this paper)
There’s about 4-5 million letters in the genome where at least one percent of humans have a different letter at that location. That’s compared to 3 billion letters overall.
Another way to look at genetic differences is to pick a random pair of humans and ask how much they are likely to differ. The answer is by about 3 million base pairs.
Ok. I guess that, for two random humans, you expect almost all 20000 genes to differ at least on a letter, right?
Given how little they expect to improve the predictors power by increasing the sample size, one can infer that these interactions, if they exist (and they surely do to some extent), just don’t explain very much of the variance.
Ok, but this shows that your models do not see the non-additive effects, not that there aren’t any. I don’t know exactly how analyses are done, but assuming they look at interactions with a model like y=β0+β1x1+β2x2+β12x1x2, then they would not pick up the α term in my example because of the hash (the “hash” stands for any very granular and nonlinear function).
But actually I think that it would be very weird to have such “stenographic” interactions only, without also simpler ones, so I’m satisfied with your answer.
Many of the differences between human genomes are actually in “promoter” regions. For a gene to be synthesized into a protein a little enzyme has to come over and bind to a spot next to the gene and transcribe the sequence into mRNA.
Other differences are in regions that don’t seem to affect traits at all. There’s a lot of leftover DNA in our genomes from endoviruses, transposons and other events in our evolutionary history. Sometimes the DNA in those regions randomly mutates into something useful and evolution will start acting on it.
As a newbie to this intriguing topic, I have various questions:
How many genes out of 20000 commonly vary between humans?
Do more complex traits like intelligence have more gene-gene interactions?
Of the total variance, do you know what’s the maximum you could explain with genes?
Assuming the polygenic scores are not close to the maximum explainable variance: how do you know that there’s not a “complex web” on top of some additive effects? Consider the following toy model: y=β0+β1x1+β2x2+αhash(x1x2). Given data generated from it, you could infer β but the hash would fall in the error variance. Even though evolution may enforce simple effects for currently varying genes, a complex web could appear on top of fixed genes; and then I’d expect a continuum of interactivity from old, fixed genes (strong interactions, “complex web”) to new variants (additive effects).
There’s about 4-5 million letters in the genome where at least one percent of humans have a different letter at that location. That’s compared to 3 billion letters overall.
Another way to look at genetic differences is to pick a random pair of humans and ask how much they are likely to differ. The answer is by about 3 million base pairs.
That’s not my impression from reading the literature. There was some giant analysis of educational attainment done last year which found literally zero gene-gene interactions. But I’m not a deep expert on this subject.
For intelligence? You can probably get to 1/3rd of variance explained just using SNP arrays like they collect for 23&Me. With whole genome sequencing and more samples you could probably get up to 45%, maybe higher.
Gwern has written quite extensively about this.
This is not a theoretical assertion but an empirical one. We have studies on educational attainment with like 3 million participants now that have shown ZERO gene-gene interactions. They definitely exist, (at least for other traits) but according to the authors I guess you need an even larger sample size to identify them. Given how little they expect to improve the predictors power by increasing the sample size, one can infer that these interactions, if they exist (and they surely do to some extent), just don’t explain very much of the variance. (Ctrl+F for “epistatic interactions” in this paper)
Ok. I guess that, for two random humans, you expect almost all 20000 genes to differ at least on a letter, right?
Ok, but this shows that your models do not see the non-additive effects, not that there aren’t any. I don’t know exactly how analyses are done, but assuming they look at interactions with a model like y=β0+β1x1+β2x2+β12x1x2, then they would not pick up the α term in my example because of the hash (the “hash” stands for any very granular and nonlinear function).
But actually I think that it would be very weird to have such “stenographic” interactions only, without also simpler ones, so I’m satisfied with your answer.
Many of the differences between human genomes are actually in “promoter” regions. For a gene to be synthesized into a protein a little enzyme has to come over and bind to a spot next to the gene and transcribe the sequence into mRNA.
Other differences are in regions that don’t seem to affect traits at all. There’s a lot of leftover DNA in our genomes from endoviruses, transposons and other events in our evolutionary history. Sometimes the DNA in those regions randomly mutates into something useful and evolution will start acting on it.