Your OP is completely misleading if you’re using plain GWAS!
GWAS is an association—that’s what the A stands for. Association is not causation. Anything that correlates with IQ (eg melanin) can show up in a GWAS for IQ. You’re gonna end up editing embryos to have lower melanin and claiming their IQ is 150
The IQ GWAS we used was based on only individuals of European ancestry, and ancestry principal components were included as covariates as is typical for GWAS. Non-causal associations from subtler stratification is still a potential concern, but I don’t believe it’s a terribly large concern. The largest educational attainment GWAS did a comparison of population and direct effects for a “cognitive performance” PGI and found that predicted direct (between sibling) effects were only attenuated by a factor of 0.824 compared to predicted population level effects. If anything I’d expect their PGI to be worse in this regard, since it included variants with less stringent statistical power cutoffs (so I’d guess it’s more likely that non-causal associations would sneak in, compared to the GWAS we used).
You should decide whether you’re using a GWAS on cognitive performance or on educational attainment (EA). This paper you linked is using a GWAS for EA, and finding that very little of the predictive power was direct effects. Exactly the opposite of your claim:
For predicting EA, the ratio of direct to population effect estimates is 0.556 (s.e. = 0.020), implying that 100% × 0.5562 = 30.9% of the PGI’s R2 is due to its direct effect.
Then they compare this to cognitive performance. For cognitive performance, the ratio was better, but it’s not 0.824, it’s 0.8242=0.68. But actually, even this is possibly too high: the table in figure 4 has a ratio that looks much smaller than this, and refers to supplementary table 10 for numbers. I checked supplementary table 10, and it says that the “direct-population ratio” is 0.656, not 0.824. So quite possibly the right value is 0.6562=0.43 even for cognitive performance.
Why is the cognitive performance number bigger? Well, it’s possibly because there’s less data on cognitive performance, so the estimates are based on more obvious or easy-to-find effects. The final, predictive power of the direct effects for EA and for cognitive performance is similar, around 3% of the variance, if I’m reading it correctly (not sure about this). So the ratios are somewhat different, but the population GWAS predictive power is also somewhat different in the opposite direction, and these mostly cancel out.
For cognitive performance, the ratio was better, but it’s not 0.824, it’s 0.8242=0.68.
That’s variance explained. I was talking about effect size attenuation, which is what we care about for editing.
I checked supplementary table 10, and it says that the “direct-population ratio” is 0.656, not 0.824. So quite possibly the right value is 0.6562=0.43 even for cognitive performance.
Supplementary table 10 is looking at direct and indirect effects of the EA PGI on other phenotypes. The results for the Cog Perf PGI are in supplementary table 13.
Thanks! I understand their numbers a bit better, then. Still, direct effects of cognitive performance explain 5% of variance. Can’t multiply the variance explained of EA by the attenuation of cognitive performance!
Do you have evidence for direct effects of either one of them being higher than 5% of variance?
I don’t quite understand your numbers in the OP but it feels like you’re inflating them substantially. Is the full calculation somewhere?
I don’t quite understand your numbers in the OP but it feels like you’re inflating them substantially. Is the full calculation somewhere?
Not quite sure which numbers you’re referring to, but if it’s the assumed SNP heritability, see the below quote of mine from another comment talking about missing heritability for IQ:
The SNP heritability estimates for IQ of (h^2 = ~0.2) are primarily based on a low quality test that has a test-retest reliability of 0.6, compared to ~0.9 for a gold-standard IQ test. So a simple calculation to adjust for this gets you a predicted SNP heritability of 0.2 * (0.9 / 0.6)^2 = 0.45 for a gold standard IQ test, which matches the SNP heritability of height. As for the rest of the missing heritability: variants with frequency less than 1% aren’t accounted for by the SNP heritability estimate, and they might contribute a decent bit if there are lots of them and their effects sizes are larger.
The h^2 = 0.19 estimate from this GWAS should be fairly robust to stratification, because of how the LDSC estimator works. (To back this up: a recent study that actually ran a small GWAS on siblings, based on the same cognitive test, also found h^2 = 0.19 for direct effects.)
The paper you called largest ever GWAS gave a direct h^2 estimate of 0.05 for cognitive performance. How are these papers getting 0.2? I don’t understand what they’re doing. Some type of meta analysis?
The test-retest reliability you linked has different reliabilities for different subtests. The correct adjustment depends on which subtests are being used. If cognitive performance is some kind of sumscore of the subtests, its reliability would be higher than for the individual subtests.
Also, I don’t think the calculation 0.2*(0.9/0.6)^2 is the correct adjustment. A test-retest correlation is already essentially the square of a correlation of the test with an underlying latent factor (both the test AND the retest have error). E.g. if a test T can be written as
T=aX+sqrt(1-a)E
where X is ability and E is error (all with standard deviation 1 and the error independent of the ability), then a correlation of T with a resample of T (with new independent error but same ability) would be a^2. But the adjustment to h^2 should be proportional to a^2, so it should be proportional to the test-retest correlation, not the square of the test-retest correlation. Am I getting this wrong?
The paper you called largest ever GWAS gave a direct h^2 estimate of 0.05 for cognitive performance. How are these papers getting 0.2? I don’t understand what they’re doing. Some type of meta analysis?
You’re mixing up h^2 estimates with predictor R^2 performance. It’s possible to get an estimate of h^2 with much less statistical power than it takes to build a predictor that good.
The test-retest reliability you linked has different reliabilities for different subtests. The correct adjustment depends on which subtests are being used. If cognitive performance is some kind of sumscore of the subtests, its reliability would be higher than for the individual subtests.
“Fluid IQ” was the only subtest used.
Also, I don’t think the calculation 0.2*(0.9/0.6)^2 is the correct adjustment. A test-retest correlation is already essentially the square of a correlation of the test with an underlying latent factor
Good catch, we’ll fix this when we revise the post.
You’re mixing up h^2 estimates with predictor R^2 performance. It’s possible to get an estimate of h^2 with much less statistical power than it takes to build a predictor that good.
Thanks. I understand now. But isn’t the R^2 the relevant measure? You don’t know which genes to edit to get the h^2 number (nor do you know what to select on). You’re doing the calculation 0.2*(0.9/0.6)^2 when the relevant calculation is something like 0.05*(0.9/0.6). Off by a factor of 6 for the power of selection, or sqrt(6)=2.45 for the power of editing
Not for this purpose! The simulation pipeline is as follows: the assumed h^2 and number of causal variants is used to generate the genetic effects → generate simulated GWASes for a range of sample sizes → infer causal effects from the observed GWASes → select top expected effect variants for up to N (expected) edits.
With a method similar to this. You can easily compute the exact likelihood function P(GWAS results | SNP effects), which when combined with a prior over SNP effects (informed by what we know about the genetic architecture of the trait) gives you a posterior probability of each SNP being causal (having nonzero effect), and its expected effect size conditional on being causal (you can’t actually calculate the full posterior since there are 2^|SNPs| possible combinations of SNPs with nonzero effects, so you need to do some sort of MCMC or stochastic search). We may make a post going into more detail on our methods at some point.
You should show your calculation or your code, including all the data and parameter choices. Otherwise I can’t evaluate this.
I assume you’re picking parameters to exaggerate the effects, because just from the exaggerations you’ve already conceded (0.9/0.6 shouldn’t be squared and attenuation to get direct effects should be 0.824), you’ve already exaggerated the results by a factor of sqrt(0.9/0.6)/0.824 for editing, which is around a 50% overestimate.
I don’t think that was deliberate on your part, but I think wishful thinking and the desire to paint a compelling story (and get funding) is causing you to be biased in what you adjust for and in which mistakes you catch. It’s natural in your position to scrutinize low estimates but not high ones. So to trust your numbers I’d need to understand how you got them.
There is one saving grace for us which is that the predictor we used is significantly less powerful than ones we know to exist.
I think when you account for both the squaring issue, the indirect effect things, and the more powerful predictors, they’re going to roughly cancel out.
Granted, the more powerful predictor itself isn’t published, so we can’t rigorously evaluate it either which isn’t ideal. I think the way to deal with this is to show a few lines: one for the “current publicly available GWAS”, one showing a rough estimate of the gain using the privately developed predictor (which with enough work we could probably replicate), and then one or two more for different amounts of data.
All of this together WILL still reduce the “best case scenario” from editing relative to what we originally published (because with the better predictor we’re closer to “perfect knowledge” than where we were with the previous predictor.
At some point we’re going to re-run the calculations and publish an actual proper writeup on our methodology (likely with our code).
Also I just want to say thank you for taking the time to dive deep into this with us. One of the main reasons I post on LessWrong is because there is such high quality feedback relative to other sites.
You should show your calculation or your code, including all the data and parameter choices. Otherwise I can’t evaluate this.
The code is pretty complicated and not something I’d expect a non-expert (even a very smart one) to be able to quickly check over; it’s not just a 100 line python script. (Or even a very smart expert for that matter, more like anyone who wasn’t already familiar with our particular codebase.) We’ll likely open source it at some point in the future, possibly soon, but that’s not decided yet. Our finemapping (inferring causal effects) procedure produces ~identical results to the software from the paper I linked above when run on the same test data (though we handle some additional things like variable per-SNP sample sizes and missing SNPs which that finemapper doesn’t handle, which is why we didn’t just use it).
The parameter choices which determine the prior over SNP effects are the number of causal SNPs (which we set to 20,000) and the SNP heritability of the phenotype (which we set to 0.19, as per the GWAS we used). The erroneous effect size adjustment was done at the end to convert from the effect sizes of the GWAS phenotype (low reliability IQ test) to the effect sizes corresponding to the phenotype we care about (high reliability IQ test).
We want to publish a more detailed write up of our methods soon(ish), but it’s going to be a fair bit of work so don’t expect it overnight.
It’s natural in your position to scrutinize low estimates but not high ones.
Yep, fair enough. I’ve noticed myself doing this sometimes and I want to cut it out. That said, I don’t think small-ish predictable overestimates to the effect sizes are going to change the qualitative picture, since with good enough data and a few hundred to a thousand edits we can boost predicted IQ by >6 SD even with much more pessimistic assumptions, which probably isn’t even safe to do (I’m not sure I expect additivity to hold that far). I’m much more worried about basic problems with our modelling assumptions, e.g. the assumption of sparse causal SNPs with additive effects and no interactions (e.g. what if rare haplotypes are deleterious due to interactions that don’t show up in GWAS since those combinations are rare?).
Your OP is completely misleading if you’re using plain GWAS!
GWAS is an association—that’s what the A stands for. Association is not causation. Anything that correlates with IQ (eg melanin) can show up in a GWAS for IQ. You’re gonna end up editing embryos to have lower melanin and claiming their IQ is 150
The IQ GWAS we used was based on only individuals of European ancestry, and ancestry principal components were included as covariates as is typical for GWAS. Non-causal associations from subtler stratification is still a potential concern, but I don’t believe it’s a terribly large concern. The largest educational attainment GWAS did a comparison of population and direct effects for a “cognitive performance” PGI and found that predicted direct (between sibling) effects were only attenuated by a factor of 0.824 compared to predicted population level effects. If anything I’d expect their PGI to be worse in this regard, since it included variants with less stringent statistical power cutoffs (so I’d guess it’s more likely that non-causal associations would sneak in, compared to the GWAS we used).
You should decide whether you’re using a GWAS on cognitive performance or on educational attainment (EA). This paper you linked is using a GWAS for EA, and finding that very little of the predictive power was direct effects. Exactly the opposite of your claim:
Then they compare this to cognitive performance. For cognitive performance, the ratio was better, but it’s not 0.824, it’s 0.8242=0.68. But actually, even this is possibly too high: the table in figure 4 has a ratio that looks much smaller than this, and refers to supplementary table 10 for numbers. I checked supplementary table 10, and it says that the “direct-population ratio” is 0.656, not 0.824. So quite possibly the right value is 0.6562=0.43 even for cognitive performance.
Why is the cognitive performance number bigger? Well, it’s possibly because there’s less data on cognitive performance, so the estimates are based on more obvious or easy-to-find effects. The final, predictive power of the direct effects for EA and for cognitive performance is similar, around 3% of the variance, if I’m reading it correctly (not sure about this). So the ratios are somewhat different, but the population GWAS predictive power is also somewhat different in the opposite direction, and these mostly cancel out.
That’s variance explained. I was talking about effect size attenuation, which is what we care about for editing.
Supplementary table 10 is looking at direct and indirect effects of the EA PGI on other phenotypes. The results for the Cog Perf PGI are in supplementary table 13.
Thanks! I understand their numbers a bit better, then. Still, direct effects of cognitive performance explain 5% of variance. Can’t multiply the variance explained of EA by the attenuation of cognitive performance!
Do you have evidence for direct effects of either one of them being higher than 5% of variance?
I don’t quite understand your numbers in the OP but it feels like you’re inflating them substantially. Is the full calculation somewhere?
Not quite sure which numbers you’re referring to, but if it’s the assumed SNP heritability, see the below quote of mine from another comment talking about missing heritability for IQ:
The h^2 = 0.19 estimate from this GWAS should be fairly robust to stratification, because of how the LDSC estimator works. (To back this up: a recent study that actually ran a small GWAS on siblings, based on the same cognitive test, also found h^2 = 0.19 for direct effects.)
The paper you called largest ever GWAS gave a direct h^2 estimate of 0.05 for cognitive performance. How are these papers getting 0.2? I don’t understand what they’re doing. Some type of meta analysis?
The test-retest reliability you linked has different reliabilities for different subtests. The correct adjustment depends on which subtests are being used. If cognitive performance is some kind of sumscore of the subtests, its reliability would be higher than for the individual subtests.
Also, I don’t think the calculation 0.2*(0.9/0.6)^2 is the correct adjustment. A test-retest correlation is already essentially the square of a correlation of the test with an underlying latent factor (both the test AND the retest have error). E.g. if a test T can be written as
T=aX+sqrt(1-a)E
where X is ability and E is error (all with standard deviation 1 and the error independent of the ability), then a correlation of T with a resample of T (with new independent error but same ability) would be a^2. But the adjustment to h^2 should be proportional to a^2, so it should be proportional to the test-retest correlation, not the square of the test-retest correlation. Am I getting this wrong?
You’re mixing up h^2 estimates with predictor R^2 performance. It’s possible to get an estimate of h^2 with much less statistical power than it takes to build a predictor that good.
“Fluid IQ” was the only subtest used.
Good catch, we’ll fix this when we revise the post.
Thanks. I understand now. But isn’t the R^2 the relevant measure? You don’t know which genes to edit to get the h^2 number (nor do you know what to select on). You’re doing the calculation 0.2*(0.9/0.6)^2 when the relevant calculation is something like 0.05*(0.9/0.6). Off by a factor of 6 for the power of selection, or sqrt(6)=2.45 for the power of editing
Not for this purpose! The simulation pipeline is as follows: the assumed h^2 and number of causal variants is used to generate the genetic effects → generate simulated GWASes for a range of sample sizes → infer causal effects from the observed GWASes → select top expected effect variants for up to N (expected) edits.
I’m talking about this graph:
What are the calculations used for this graph. Text says to see the appendix but the appendix does not actually explain how you got this graph.
This is based on inferring causal effects conditional on this GWAS. The assumed heritability affects the prior over SNP effect sizes.
I don’t understand. Can you explain how you’re inferring the SNP effect sizes?
With a method similar to this. You can easily compute the exact likelihood function P(GWAS results | SNP effects), which when combined with a prior over SNP effects (informed by what we know about the genetic architecture of the trait) gives you a posterior probability of each SNP being causal (having nonzero effect), and its expected effect size conditional on being causal (you can’t actually calculate the full posterior since there are 2^|SNPs| possible combinations of SNPs with nonzero effects, so you need to do some sort of MCMC or stochastic search). We may make a post going into more detail on our methods at some point.
You should show your calculation or your code, including all the data and parameter choices. Otherwise I can’t evaluate this.
I assume you’re picking parameters to exaggerate the effects, because just from the exaggerations you’ve already conceded (0.9/0.6 shouldn’t be squared and attenuation to get direct effects should be 0.824), you’ve already exaggerated the results by a factor of sqrt(0.9/0.6)/0.824 for editing, which is around a 50% overestimate.
I don’t think that was deliberate on your part, but I think wishful thinking and the desire to paint a compelling story (and get funding) is causing you to be biased in what you adjust for and in which mistakes you catch. It’s natural in your position to scrutinize low estimates but not high ones. So to trust your numbers I’d need to understand how you got them.
There is one saving grace for us which is that the predictor we used is significantly less powerful than ones we know to exist.
I think when you account for both the squaring issue, the indirect effect things, and the more powerful predictors, they’re going to roughly cancel out.
Granted, the more powerful predictor itself isn’t published, so we can’t rigorously evaluate it either which isn’t ideal. I think the way to deal with this is to show a few lines: one for the “current publicly available GWAS”, one showing a rough estimate of the gain using the privately developed predictor (which with enough work we could probably replicate), and then one or two more for different amounts of data.
All of this together WILL still reduce the “best case scenario” from editing relative to what we originally published (because with the better predictor we’re closer to “perfect knowledge” than where we were with the previous predictor.
At some point we’re going to re-run the calculations and publish an actual proper writeup on our methodology (likely with our code).
Also I just want to say thank you for taking the time to dive deep into this with us. One of the main reasons I post on LessWrong is because there is such high quality feedback relative to other sites.
The code is pretty complicated and not something I’d expect a non-expert (even a very smart one) to be able to quickly check over; it’s not just a 100 line python script. (Or even a very smart expert for that matter, more like anyone who wasn’t already familiar with our particular codebase.) We’ll likely open source it at some point in the future, possibly soon, but that’s not decided yet. Our finemapping (inferring causal effects) procedure produces ~identical results to the software from the paper I linked above when run on the same test data (though we handle some additional things like variable per-SNP sample sizes and missing SNPs which that finemapper doesn’t handle, which is why we didn’t just use it).
The parameter choices which determine the prior over SNP effects are the number of causal SNPs (which we set to 20,000) and the SNP heritability of the phenotype (which we set to 0.19, as per the GWAS we used). The erroneous effect size adjustment was done at the end to convert from the effect sizes of the GWAS phenotype (low reliability IQ test) to the effect sizes corresponding to the phenotype we care about (high reliability IQ test).
We want to publish a more detailed write up of our methods soon(ish), but it’s going to be a fair bit of work so don’t expect it overnight.
Yep, fair enough. I’ve noticed myself doing this sometimes and I want to cut it out. That said, I don’t think small-ish predictable overestimates to the effect sizes are going to change the qualitative picture, since with good enough data and a few hundred to a thousand edits we can boost predicted IQ by >6 SD even with much more pessimistic assumptions, which probably isn’t even safe to do (I’m not sure I expect additivity to hold that far). I’m much more worried about basic problems with our modelling assumptions, e.g. the assumption of sparse causal SNPs with additive effects and no interactions (e.g. what if rare haplotypes are deleterious due to interactions that don’t show up in GWAS since those combinations are rare?).