the 23andme dataset is probably not as useful as you project. They are working from a fixed set of variants, not full genomes or even a complete set of SNPs known to vary. There are certainly many SNPs of interest that just aren’t in their data.
in projecting the gains from discovering further variants that affect intelligence, it’s not clear whether you’ve accounted for the low hanging fruit effect. With these statistical approaches, we obviously discover the variants of largest effect first. Adding millions of additional genomes or genotypes will allow us to resolve thousands of additional common variants, but they are going to be the ones that have really tiny effect sizes.
On the other hand (contradicting point 2 somewhat), quite a substantial fraction of variation in intelligence and other traits is likely due to the genetic load—rare mutations, some likely of substantial effect, all deleterious by definition. Identifying these and their effects is a thorny statistical problem due to their rarity, but if we can, they would actually be very promising edit targets. The advantage being the likely lack of negative side effects, and the fact that the top few for any person would likely be of large effect. Some of them are also probably wide-effect boosts, fixes to fundamental bits of cellular machinery! Downside is that this would be a custom targeting job per person.
The use of ’800 IQ’ is a little grating. The tests only go to 200 or 210 (and are not convincingly normed at that level). Still, fully superhuman, entirely outside the normal human trait range… I guess it’s a fair way to gesture at that.
our predictive models for IQ work significantly better for European or white populations because they were trained on that population. This implies that obtaining a bunch of data for Asian and African populations would allow us to identify additional targets. It surprises me that we don’t have some huge dataset from China, but at least we recently developed a 100K+ genotype of Han individuals, which should turn up some additional hits.
Overall, really promising direction. I appreciate the writeup on new and improved edit methods—I had not been following the field closely, and was unaware we had advanced this much on the previously state of the art CRISPR/Cas9.
the 23andme dataset is probably not as useful as you project. They are working from a fixed set of variants, not full genomes or even a complete set of SNPs known to vary. There are certainly many SNPs of interest that just aren’t in their data.
It’s possible the source I read was misleading, but last I checked they use SNP arrays with 650k variants, which is roughly all loci with minor allele frequency >1%. That’s enough to make quite a strong predictor, especially since they have a fair number of non-european participants with different linkage disequilibrium (more helpful for pinpointing the causal variant in a cluster).
in projecting the gains from discovering further variants that affect intelligence, it’s not clear whether you’ve accounted for the low hanging fruit effect. With these statistical approaches, we obviously discover the variants of largest effect first. Adding millions of additional genomes or genotypes will allow us to resolve thousands of additional common variants, but they are going to be the ones that have really tiny effect sizes.
The simulation accounts for that. That’s why gain per additional edit is logarithmic.
On the other hand (contradicting point 2 somewhat), quite a substantial fraction of variation in intelligence and other traits is likely due to the genetic load—rare mutations, some likely of substantial effect, all deleterious by definition. Identifying these and their effects is a thorny statistical problem due to their rarity, but if we can, they would actually be very promising edit targets. The advantage being the likely lack of negative side effects, and the fact that the top few for any person would likely be of large effect. Some of them are also probably wide-effect boosts, fixes to fundamental bits of cellular machinery! Downside is that this would be a custom targeting job per person.
We’ll get better at identifying rare variants with large causal effects soon. UK Biobank just released 500k whole genomes in late November, so we should see the first studies on that data come out in the next few months.
The simulations we ran assume that the dataset only contains variants with minor allele frequencey >1%. Any vairants with lower frequency than that will increase the average marginal effect per edit but aren’t necessary for this tech to work in general.
The use of ’800 IQ’ is a little grating. The tests only go to 200 or 210 (and are not convincingly normed at that level). Still, fully superhuman, entirely outside the normal human trait range… I guess it’s a fair way to gesture at that.
This is why I specifically used language in the post like “don’t take this too seriously” and “I don’t expect such an IQ to actually result from flipping all IQ-decreasing alleles to their IQ-increasing variants for the same reason I don’t expect to reach the moon by climbing a very tall ladder”
our predictive models for IQ work significantly better for European or white populations because they were trained on that population. This implies that obtaining a bunch of data for Asian and African populations would allow us to identify additional targets. It surprises me that we don’t have some huge dataset from China, but at least we recently developed a 100K+ genotype of Han individuals, which should turn up some additional hits.
There’s less difference between genetic ancestry groups when it comes to editing than there is for embryo selection. With embryo selection, you can rely on linkage disequilibrium patterns remaining relatively consistent among Europeans to compensate for your uncertainty about which variant in a cluster is causal. You can’t do that with editing.
So getting data from other ancestry groups (particularly Africans, who have the greatest variance in LD structure) will actually editing more efficient for everyone, including Europeans.
The lack of non-European data is slowly being solved, but at the moment I know of no non-European data source that has good IQ phenotype data. There are definitely biobanks and consumer genomics companies who have the data, so they could do it if they want to.
Thanks for the thoughtful comment. I’m glad you enjoyed the post!
Some points:
the 23andme dataset is probably not as useful as you project. They are working from a fixed set of variants, not full genomes or even a complete set of SNPs known to vary. There are certainly many SNPs of interest that just aren’t in their data.
in projecting the gains from discovering further variants that affect intelligence, it’s not clear whether you’ve accounted for the low hanging fruit effect. With these statistical approaches, we obviously discover the variants of largest effect first. Adding millions of additional genomes or genotypes will allow us to resolve thousands of additional common variants, but they are going to be the ones that have really tiny effect sizes.
On the other hand (contradicting point 2 somewhat), quite a substantial fraction of variation in intelligence and other traits is likely due to the genetic load—rare mutations, some likely of substantial effect, all deleterious by definition. Identifying these and their effects is a thorny statistical problem due to their rarity, but if we can, they would actually be very promising edit targets. The advantage being the likely lack of negative side effects, and the fact that the top few for any person would likely be of large effect. Some of them are also probably wide-effect boosts, fixes to fundamental bits of cellular machinery! Downside is that this would be a custom targeting job per person.
The use of ’800 IQ’ is a little grating. The tests only go to 200 or 210 (and are not convincingly normed at that level). Still, fully superhuman, entirely outside the normal human trait range… I guess it’s a fair way to gesture at that.
our predictive models for IQ work significantly better for European or white populations because they were trained on that population. This implies that obtaining a bunch of data for Asian and African populations would allow us to identify additional targets. It surprises me that we don’t have some huge dataset from China, but at least we recently developed a 100K+ genotype of Han individuals, which should turn up some additional hits.
Overall, really promising direction. I appreciate the writeup on new and improved edit methods—I had not been following the field closely, and was unaware we had advanced this much on the previously state of the art CRISPR/Cas9.
It’s possible the source I read was misleading, but last I checked they use SNP arrays with 650k variants, which is roughly all loci with minor allele frequency >1%. That’s enough to make quite a strong predictor, especially since they have a fair number of non-european participants with different linkage disequilibrium (more helpful for pinpointing the causal variant in a cluster).
The simulation accounts for that. That’s why gain per additional edit is logarithmic.
We’ll get better at identifying rare variants with large causal effects soon. UK Biobank just released 500k whole genomes in late November, so we should see the first studies on that data come out in the next few months.
The simulations we ran assume that the dataset only contains variants with minor allele frequencey >1%. Any vairants with lower frequency than that will increase the average marginal effect per edit but aren’t necessary for this tech to work in general.
This is why I specifically used language in the post like “don’t take this too seriously” and “I don’t expect such an IQ to actually result from flipping all IQ-decreasing alleles to their IQ-increasing variants for the same reason I don’t expect to reach the moon by climbing a very tall ladder”
There’s less difference between genetic ancestry groups when it comes to editing than there is for embryo selection. With embryo selection, you can rely on linkage disequilibrium patterns remaining relatively consistent among Europeans to compensate for your uncertainty about which variant in a cluster is causal. You can’t do that with editing.
So getting data from other ancestry groups (particularly Africans, who have the greatest variance in LD structure) will actually editing more efficient for everyone, including Europeans.
The lack of non-European data is slowly being solved, but at the moment I know of no non-European data source that has good IQ phenotype data. There are definitely biobanks and consumer genomics companies who have the data, so they could do it if they want to.
Thanks for the thoughtful comment. I’m glad you enjoyed the post!