Another genomics PhD here. It’s a complex topic. We know that combinatorial effects (epistasis in genetics lingo) matter, from population genetics studies in model organisms. This is despite the fact that simple linear models perform well in the human population—provided they are against some reasonably constant genetic background, low allele frequencies mean that the combinatorial effects are well captured by linear ones.
The problem is that even if you only care about pairwise combinations, there are far too many of them, given a uniform prior. Even if we sequence everyone on earth we wouldn’t have anywhere near enough info, sequencing additional individuals has diminishing returns because there’s only so much genetic variation in the human population (and ~23000^2 possible pairwise combinations).
What we need are good priors over combinations of mutations. To do that we’ll need detailed info about which genes function together to produce which phenotypes. Such models exist already and are seeing moderate success, but we need new ideas and more data than any one startup could provide. Which is exactly what molecular biologists are working on.
OP here. Having learned more statistics since I last posted—I reckon it could be as simple as exploring various interactions (effect modifications) in the data with respect to additional SNP’s. The issue would be that interactions require greater sample sizes to avoid spurious results and most genetics research has woefully low sample sizes which would only be harder to overcome when inching towards more personalised medicine based on individual genomes.
Yes that’s the case. To get enough data we probably need lots of in vitro experiments. Remember that data is not equal to information—even really big sample sizes wouldn’t be enough to resolve the combinatoric explosion. What I mean in that comment up there (I posted it before it was finished, I think) is that there are ~23k genes in the genome, so even under the absurdly simple assumption that there’s only one mutation possible per gene, you have half a billion possible combinations of gene breakages, which you will never ever be able to get enough of a sample size to look at blindly.
Based on your more intimate knowledge and access to knowledge in the area, what kind of $USD investment (even an order of magnitude estimate would suffice, if the former is intractable) would we be looking at if an amount of resources, proportional to the potential humanitarian impact relative to mosquito transmitted diseases, where to be spent to develop a gene drive ready for use in the Tsetse fly, a species regarded as responsible for preventing an African ‘green revolution’ like was seen in Asia and thus part of the whole fable of African starvation? Any way to incorporate resource investment into mitigating relevant risks?. It seems like an academic has independently started thinking along the same lines.
Hmmmm. I’m shamefully ignorant about prices, but I would estimate such an effort would be in the tens of millions, if you wanted it done quickly (and it will still take a while). As far as I’m aware we haven’t developed methods for transgenesis in Tetse flies, having only gotten the genome sequenced in 2014 (priorities people?!), and setting it up in a new organism in a new organism with an unusual life cycle can be surprisingly difficult. The link below describes techniques for manipulating gut microbes in the flies, which I don’t think would suffice.
In drosophila you can’t go from cell culture to an embryo easily like in mammals, you have to inject stuff into embryos and then breed from those embryos and hope some of your vector got into the germ line. In Tetse flies, I am now aware, the mother keeps the embryo until it’s quite developed, meaning the techniques used in Drosophila wouldn’t work, and we certainly don’t have any tetse cell lines, which I doubt would be of use anyway. So you’d be looking at developing a novel means of transgenesis. (Viral vector targetting the germ line maybe?? ) Which is a task that, while no doubt solvable, inevitably has big uncertainties in it.
So yes, tens of millions, give or take an order of magnitude, plus years and years of work. Well worth doing though. In my opinion the potential gains far outweigh the risks.
P.S. The link to ‘relevant risks’ you posted is broken, I’d be interested in seeing it.
I really appreciate the explanations in this thread. I was wondering if anyone had an update regarding recent developments in this space. Specifically, using big data to solve for genetic / protein links to phenotypes. I have also been struggling to find more recent information regarding genosets.
Apologies if any of that is unclear, I am still relatively new to this.
Another genomics PhD here. It’s a complex topic. We know that combinatorial effects (epistasis in genetics lingo) matter, from population genetics studies in model organisms. This is despite the fact that simple linear models perform well in the human population—provided they are against some reasonably constant genetic background, low allele frequencies mean that the combinatorial effects are well captured by linear ones.
The problem is that even if you only care about pairwise combinations, there are far too many of them, given a uniform prior. Even if we sequence everyone on earth we wouldn’t have anywhere near enough info, sequencing additional individuals has diminishing returns because there’s only so much genetic variation in the human population (and ~23000^2 possible pairwise combinations).
What we need are good priors over combinations of mutations. To do that we’ll need detailed info about which genes function together to produce which phenotypes. Such models exist already and are seeing moderate success, but we need new ideas and more data than any one startup could provide. Which is exactly what molecular biologists are working on.
OP here. Having learned more statistics since I last posted—I reckon it could be as simple as exploring various interactions (effect modifications) in the data with respect to additional SNP’s. The issue would be that interactions require greater sample sizes to avoid spurious results and most genetics research has woefully low sample sizes which would only be harder to overcome when inching towards more personalised medicine based on individual genomes.
Yes that’s the case. To get enough data we probably need lots of in vitro experiments. Remember that data is not equal to information—even really big sample sizes wouldn’t be enough to resolve the combinatoric explosion. What I mean in that comment up there (I posted it before it was finished, I think) is that there are ~23k genes in the genome, so even under the absurdly simple assumption that there’s only one mutation possible per gene, you have half a billion possible combinations of gene breakages, which you will never ever be able to get enough of a sample size to look at blindly.
Based on your more intimate knowledge and access to knowledge in the area, what kind of $USD investment (even an order of magnitude estimate would suffice, if the former is intractable) would we be looking at if an amount of resources, proportional to the potential humanitarian impact relative to mosquito transmitted diseases, where to be spent to develop a gene drive ready for use in the Tsetse fly, a species regarded as responsible for preventing an African ‘green revolution’ like was seen in Asia and thus part of the whole fable of African starvation? Any way to incorporate resource investment into mitigating relevant risks?. It seems like an academic has independently started thinking along the same lines.
Hmmmm. I’m shamefully ignorant about prices, but I would estimate such an effort would be in the tens of millions, if you wanted it done quickly (and it will still take a while). As far as I’m aware we haven’t developed methods for transgenesis in Tetse flies, having only gotten the genome sequenced in 2014 (priorities people?!), and setting it up in a new organism in a new organism with an unusual life cycle can be surprisingly difficult. The link below describes techniques for manipulating gut microbes in the flies, which I don’t think would suffice.
In drosophila you can’t go from cell culture to an embryo easily like in mammals, you have to inject stuff into embryos and then breed from those embryos and hope some of your vector got into the germ line. In Tetse flies, I am now aware, the mother keeps the embryo until it’s quite developed, meaning the techniques used in Drosophila wouldn’t work, and we certainly don’t have any tetse cell lines, which I doubt would be of use anyway. So you’d be looking at developing a novel means of transgenesis. (Viral vector targetting the germ line maybe?? ) Which is a task that, while no doubt solvable, inevitably has big uncertainties in it.
So yes, tens of millions, give or take an order of magnitude, plus years and years of work. Well worth doing though. In my opinion the potential gains far outweigh the risks.
P.S. The link to ‘relevant risks’ you posted is broken, I’d be interested in seeing it.
I really appreciate the explanations in this thread. I was wondering if anyone had an update regarding recent developments in this space. Specifically, using big data to solve for genetic / protein links to phenotypes. I have also been struggling to find more recent information regarding genosets.
Apologies if any of that is unclear, I am still relatively new to this.