The remarkable thing about human genetics is that most of the variants ARE additive.
I think this is likely incorrect, at least where intelligence-affecting SNPs stacked in large numbers are concerned.
To make an analogy to ML, the effect of a brain-affecting gene will be to push a hyperparameter in one direction or the other. If that hyperparameter is (on average) not perfectly tuned, then one of the variants will be an enhancement, since it leads to a hyperparameter-value that is (on average) closer to optimal.
If each hyperparameter is affected by many genes (or, almost-equivalently, if the number of genes greatly exceeds the number of hyperparameters), then intelligence-affecting traits will look additive so long as you only look at pairs, because most pairs you look at will not affect the same hyperparameter, and when they do affect the same hyperparameter the combined effect still won’t be large enough to overshoot the optimum. However, if you stack many gene edits, and this model of genes mapping to hyperparameters is correct, then the most likely outcome is that you move each hyperparameter in the correct direction but overshooting the optimum. Phrased slightly differently: intelligence-affecting genes may be additive on current margins, but not remain additive when you stack edits in this way.
To make another analogy: SNPs affecting height may be fully additive, but if the thing you actually care about is basketball-playing ability, there is an optimum amount of editing after which you should stop, because while people who are 2m tall are much better at basketball than people who are 1.7m tall, people who are 2.6m tall are cripples.
For this reason, even if all the gene-editing biology works out, you will not produce people in the upper end of the range you forecast.
You can probably somewhat improve this situation by varying the number of edits you do. Ie, you have some babies in which you edit a randomly selected 10% of known intelligence-affecting SNPs, some in which you’ve edited 20%, some 30%, and so on. But finding the real optimum will probably require understanding what the SNPs actually do, in terms of a model of brain biology, and understanding brain biology well enough to make judgment calls about that.
if you’re applying the concept of +7 SDs seriously here (let alone +20 SDs) I’m almost certain you’re grossly misusing the concept of a standard deviation.
and since you’re a coauthor of this post, it strongly suggests to me that the analysis done here is unreliable.
Standard deviations are used to characterize the spread around the mean of a normal distribution—it is not intended to characterize the tails. This is why discussion around it tends to focus on the 1-2 SDs, where the bulk of the data is, and rarely 3-4 SDs—it is rare to have the data (of sufficient size or low noise) to support meaningful interpretation of even 4 SDs with real-world data.
So in practice, using precise figures like 5, 7, or 20 SDs is misleading, because the tails aren’t usually sufficiently characterized (and it certainly isn’t with intelligence) -- all you can really say is that it’s beyond the validated range of the test. It’s like taking seriously a measurement of 151.887 when the instrument operates in integers up to 10 -- you’re implying you’re meaningfully operating on a level of precision and range that you don’t realistically have. It comes across as incredibly careless with regard to statistical nuance and rigor.
I suspect the analogy does not really work that well. Much of human genetic variation is just bad mutations that take a while to be selected out. For example, maybe a gene variant slightly decreases the efficiency of your neurons and makes everything in your brain slightly slower
I think this is likely incorrect, at least where intelligence-affecting SNPs stacked in large numbers are concerned.
To make an analogy to ML, the effect of a brain-affecting gene will be to push a hyperparameter in one direction or the other. If that hyperparameter is (on average) not perfectly tuned, then one of the variants will be an enhancement, since it leads to a hyperparameter-value that is (on average) closer to optimal.
If each hyperparameter is affected by many genes (or, almost-equivalently, if the number of genes greatly exceeds the number of hyperparameters), then intelligence-affecting traits will look additive so long as you only look at pairs, because most pairs you look at will not affect the same hyperparameter, and when they do affect the same hyperparameter the combined effect still won’t be large enough to overshoot the optimum. However, if you stack many gene edits, and this model of genes mapping to hyperparameters is correct, then the most likely outcome is that you move each hyperparameter in the correct direction but overshooting the optimum. Phrased slightly differently: intelligence-affecting genes may be additive on current margins, but not remain additive when you stack edits in this way.
To make another analogy: SNPs affecting height may be fully additive, but if the thing you actually care about is basketball-playing ability, there is an optimum amount of editing after which you should stop, because while people who are 2m tall are much better at basketball than people who are 1.7m tall, people who are 2.6m tall are cripples.
For this reason, even if all the gene-editing biology works out, you will not produce people in the upper end of the range you forecast.
You can probably somewhat improve this situation by varying the number of edits you do. Ie, you have some babies in which you edit a randomly selected 10% of known intelligence-affecting SNPs, some in which you’ve edited 20%, some 30%, and so on. But finding the real optimum will probably require understanding what the SNPs actually do, in terms of a model of brain biology, and understanding brain biology well enough to make judgment calls about that.
I definitely don’t expect additivity holds out to like +20 SDs. We’d be aiming for more like +7 SDs.
From population mean or from parent mean?
Population mean
if you’re applying the concept of +7 SDs seriously here (let alone +20 SDs) I’m almost certain you’re grossly misusing the concept of a standard deviation.
and since you’re a coauthor of this post, it strongly suggests to me that the analysis done here is unreliable.
Care to explain how you think it’s being misused?
Standard deviations are used to characterize the spread around the mean of a normal distribution—it is not intended to characterize the tails. This is why discussion around it tends to focus on the 1-2 SDs, where the bulk of the data is, and rarely 3-4 SDs—it is rare to have the data (of sufficient size or low noise) to support meaningful interpretation of even 4 SDs with real-world data.
So in practice, using precise figures like 5, 7, or 20 SDs is misleading, because the tails aren’t usually sufficiently characterized (and it certainly isn’t with intelligence) -- all you can really say is that it’s beyond the validated range of the test. It’s like taking seriously a measurement of 151.887 when the instrument operates in integers up to 10 -- you’re implying you’re meaningfully operating on a level of precision and range that you don’t realistically have. It comes across as incredibly careless with regard to statistical nuance and rigor.
I suspect the analogy does not really work that well. Much of human genetic variation is just bad mutations that take a while to be selected out. For example, maybe a gene variant slightly decreases the efficiency of your neurons and makes everything in your brain slightly slower