It could already be fixed under different regulatory and scientific regimes, enough data exists in general. The statistics isn’t the hard part. (This is much of why UKBB is so amazing.) The barrier is datasharing for TWASes. Hence, much like the question of ‘how well do covid vaccines protect healthy people against infection’ or ‘does this anti-covid drug actually work’ or ‘can we develop a useful covid rapid at-home test’, it’ll take exactly as long to fix as everyone wants it to take, which can be arbitrarily long.
Relatively small amounts (ie roughly already existing or easily obtained) of data are necessary. From a statistical POV, a rising tide lifts all boats—the large GWASes have already done most of the work in prioritizing SNPs containing causal variants and providing highly informative priors on both the distribution of effects & specifically where to look. (Power-wise, it looks something like: if you have a GWAS n=200k in Europeans, you don’t need n=200k in East Asians to get an equivalent East Asian PGS, you only need like n<20k. Think of it as like layers of Swiss cheese: the European GWAS hits will be ambiguous within each block as to which SNP inside it is causal, but then the East Asian blocks slice it up differently and those 3-4 candidates will be split up across different blocks, and you only need to decide between a few candidates, as opposed to the original prior-less situation where you start with millions of candidates. And there are, across the world, much more than 20k genotyped East Asians etc.)
It could already be fixed under different regulatory and scientific regimes, enough data exists in general. The statistics isn’t the hard part. (This is much of why UKBB is so amazing.) The barrier is datasharing for TWASes. Hence, much like the question of ‘how well do covid vaccines protect healthy people against infection’ or ‘does this anti-covid drug actually work’ or ‘can we develop a useful covid rapid at-home test’, it’ll take exactly as long to fix as everyone wants it to take, which can be arbitrarily long.
Relatively small amounts (ie roughly already existing or easily obtained) of data are necessary. From a statistical POV, a rising tide lifts all boats—the large GWASes have already done most of the work in prioritizing SNPs containing causal variants and providing highly informative priors on both the distribution of effects & specifically where to look. (Power-wise, it looks something like: if you have a GWAS n=200k in Europeans, you don’t need n=200k in East Asians to get an equivalent East Asian PGS, you only need like n<20k. Think of it as like layers of Swiss cheese: the European GWAS hits will be ambiguous within each block as to which SNP inside it is causal, but then the East Asian blocks slice it up differently and those 3-4 candidates will be split up across different blocks, and you only need to decide between a few candidates, as opposed to the original prior-less situation where you start with millions of candidates. And there are, across the world, much more than 20k genotyped East Asians etc.)
Is it not already sort of fixed?
We know how well PRS perform in other ancestries, right? It just means that PRS are a little bit less good, not that it doesn’t work today.