One issue nobody has raised yet is the effects of structural racism.
The GWAS studies used to create the polygenic risk scores generally have a very pronounced sampling bias towards people of European ancestry. See for example the GWAS Diversity Monitor, which is a dashboard meant to monitor the sampling practices used by GWAS studies. In addition to selecting people to sample by ethnicity, an accepted practice is to look at the genomes after sampling and try to identify and exclude “ethnic outliers”.
Note also that even the papers complaining about this problem are still breaking down their results by very abstract discrete dimensions like “5 continental populations”, which sweep a lot of people under a very large rug. If you and your partner have different ethnicities, you get to be on the wrong end of fun lines like this one, from that last paper:
Related to stratification, most PRS methods do not explicitly address recent admixture and none consider recently admixed individuals’ unique local mosaic of ancestry; further methods development is needed.
It could already be fixed under different regulatory and scientific regimes, enough data exists in general. The statistics isn’t the hard part. (This is much of why UKBB is so amazing.) The barrier is datasharing for TWASes. Hence, much like the question of ‘how well do covid vaccines protect healthy people against infection’ or ‘does this anti-covid drug actually work’ or ‘can we develop a useful covid rapid at-home test’, it’ll take exactly as long to fix as everyone wants it to take, which can be arbitrarily long.
Relatively small amounts (ie roughly already existing or easily obtained) of data are necessary. From a statistical POV, a rising tide lifts all boats—the large GWASes have already done most of the work in prioritizing SNPs containing causal variants and providing highly informative priors on both the distribution of effects & specifically where to look. (Power-wise, it looks something like: if you have a GWAS n=200k in Europeans, you don’t need n=200k in East Asians to get an equivalent East Asian PGS, you only need like n<20k. Think of it as like layers of Swiss cheese: the European GWAS hits will be ambiguous within each block as to which SNP inside it is causal, but then the East Asian blocks slice it up differently and those 3-4 candidates will be split up across different blocks, and you only need to decide between a few candidates, as opposed to the original prior-less situation where you start with millions of candidates. And there are, across the world, much more than 20k genotyped East Asians etc.)
One issue nobody has raised yet is the effects of structural racism.
The GWAS studies used to create the polygenic risk scores generally have a very pronounced sampling bias towards people of European ancestry. See for example the GWAS Diversity Monitor, which is a dashboard meant to monitor the sampling practices used by GWAS studies. In addition to selecting people to sample by ethnicity, an accepted practice is to look at the genomes after sampling and try to identify and exclude “ethnic outliers”.
If you or your partner don’t have ethnicities that would make your genomes look typical among the samples used to train the scoring algorithm, it’s an open question whether any particular score instrument is going to be usefully predictive for you or your potential child. See for example Generalization and dilution of association results from European GWAS in populations of non-European ancestry: the PAGE study, which found that, while many GWAS hits generalize from a very restricted sample, a substantial fraction don’t. See also Current clinical use of polygenic scores will risk exacerbating health disparities, which discusses polygenic risk scores in particular, and their accuracy falloff when used on people who the score developers would have excluded from their training set.
Note also that even the papers complaining about this problem are still breaking down their results by very abstract discrete dimensions like “5 continental populations”, which sweep a lot of people under a very large rug. If you and your partner have different ethnicities, you get to be on the wrong end of fun lines like this one, from that last paper:
How long do you think it’ll take for that to be fixed?
It could already be fixed under different regulatory and scientific regimes, enough data exists in general. The statistics isn’t the hard part. (This is much of why UKBB is so amazing.) The barrier is datasharing for TWASes. Hence, much like the question of ‘how well do covid vaccines protect healthy people against infection’ or ‘does this anti-covid drug actually work’ or ‘can we develop a useful covid rapid at-home test’, it’ll take exactly as long to fix as everyone wants it to take, which can be arbitrarily long.
Relatively small amounts (ie roughly already existing or easily obtained) of data are necessary. From a statistical POV, a rising tide lifts all boats—the large GWASes have already done most of the work in prioritizing SNPs containing causal variants and providing highly informative priors on both the distribution of effects & specifically where to look. (Power-wise, it looks something like: if you have a GWAS n=200k in Europeans, you don’t need n=200k in East Asians to get an equivalent East Asian PGS, you only need like n<20k. Think of it as like layers of Swiss cheese: the European GWAS hits will be ambiguous within each block as to which SNP inside it is causal, but then the East Asian blocks slice it up differently and those 3-4 candidates will be split up across different blocks, and you only need to decide between a few candidates, as opposed to the original prior-less situation where you start with millions of candidates. And there are, across the world, much more than 20k genotyped East Asians etc.)
Is it not already sort of fixed?
We know how well PRS perform in other ancestries, right? It just means that PRS are a little bit less good, not that it doesn’t work today.