Isn’t the derivative of the full variable in one of the multiplicands still noticeable? Maybe it would help if you make some quantitative statement?
Taking the logarithm (to linearize the association) scales the derivative down by the reciprocal of the magnitude. So if one of the terms in the sum is really big, all the derivatives get scaled down by a lot. If each of the terms are a product, then the derivative for the big term gets scaled up to cancel out the downscaling, but the small terms do not.
Under the condition I mentioned, polygenic scores will tend to focus on the traits that cause the most common kind of depression, while neglecting other kinds. The missing heritability will be due to missing those other kinds.
Can you please write down the expressions you’re talking about as math? If you’re trying to invoke standard genetics knowledge, I’m not a geneticist and I’m not picking it up from what you’re saying.
Let’s start with the basics: If the outcome f is a linear function of the genes x, that is f(x)=βx, then the effect of each gene is given by the gradient of f, i.e. ∇xf(x)=β. (This is technically a bit sketchy since a genetic variant is discrete while gradients require continuity, but it works well enough as a conceptual approximation for our purposes.) Under this circumstance, we can think of genomic studies as finding β. (This is also technically a bit sketchy because of linkage disequillibrium and such, but it works well enough as a conceptual approximation for our purposes.)
If f isn’t a linear function, then there is no constant β to find. However, the argument for genomic studies still mostly goes through that they can find E[∇xf(x)], it’s just that this expression now denotes a weird mismash effect size that’s not very interpretable.
As you observed, if f is almost-linear, for example if f(x)=eβx, then genomic studies still have good options. The best is probably to measure the genetic influence on logf, as then we get a pretty meaningful coefficient out of it. (If we measured the genetic influence of f without the logarithm, I think under commonly viable assumptions we would get β′i∝eβi−1, but don’t cite me on that.)
The trouble arises when you have deeply nonlinear forms such as f(x)=eβx+eγx. If we take the gradient of this, then the chain rule gives us ∇logf(x)=eβxβ+eγxγeβx+eγx. That is, the two different mechanisms “suppress” each other, so if eβx is usually high, then the γ term would usually be (implicitly!) excluded from the analysis.
Ah. Thank you, this makes sense of what you said earlier. (I / someone could have gotten this from what you had written before, by thinking about it more, probably.)
I agree with your analysis as math.
However, I’m skeptical of the application to the genetics stuff, or at least I don’t see it yet. Specifically, you wrote:
If you’ve got a lot of terms in the sum and the distribution of the variables is correct, this can basically kill the bulk of common additive variance. Conceptually speaking, this can be thought of as “your system is a mixture of a bunch of qualitatively distinct things”. Like if you imagine divorce or depression can be caused by a bunch of qualitatively unrelated things.
And your argument here says that there’s “gradient interference” between the summed products specifically when one of the summed products is really big. But in the case of disease risk, IIUC the sum-of-products f(x) is something like logits. So translating your argument, it’s like:
Suppose one of the causes of X-disease contributes a ton of logits, to the point where it’s already overdetermined that you have X. Then you can’t notice the effects of another one of the causes of X. Even if there are such effects, most people have the disease anyway, so you get very little signal, which only comes from the lucky few who didn’t get X from the first cause.
In this case, yes the analysis is valid, but it’s not very relevant. For the diseases that people tend to talk about, if there are several substantial disjunctive causes (I mean, the risk is a sum of a few different sub-risks), then they all would show substantial signal in the data. None of them drowns out all the others.
Maybe you just meant to say “In theory this could happen”.
Or am I missing what you’re suggesting? E.g. is there a way for there to be a trait that:
has lots of variation (e.g. lots of sick people and lots of non-sick people), and
it’s genetic, and
it’s a fairly simple functional form like we’ve been discussing,
but you can’t optimize it much by changing a bunch of variants found by looking at some millions of genotype/phenotype pairs?
The original discussion was about how personality traits and social outcomes could behave fundamentally differently from biological traits when it comes to genetics. So this isn’t necessarily meant to apply to disease risks.
Well you brought up depression. But anyway, all my questions apply to personality traits as well.
..… To rephrase / explain how confused I am about what you’re trying to tell me: It kinda sounds like you’re saying “If some trait is strongly determined by one big chunk of genes, then you won’t be able to see how some other chunk affects the trait.”. But this can’t explain missing heritability! In this scenario, none of the heritability is even from the second chunk of genes in the first place! Or am I missing something?
Because if some of the heritability is from the second chunk, that means that for some pairs of people, they have roughly the same first chunk but somewhat different second chunks; and they have different traits, due to the difference in second chunks. If some amount of heritability is from the second chunk, then to that extent, there’s a bunch of pairs of people whose trait differences are explained by second chunk differences. If you made a PGS, you’d see these pairs of people and then you’d find out how specifically the second chunk affects the trait.
I could be confused about some really basic math here, but yeah, I don’t see it. Your example for how the gradient doesn’t flow seems to say “the gradient doesn’t flow because the second chunk doesn’t actually affect the trait”.
If some amount of heritability is from the second chunk, then to that extent, there’s a bunch of pairs of people whose trait differences are explained by second chunk differences. If you made a PGS, you’d see these pairs of people and then you’d find out how specifically the second chunk affects the trait.
This only applies if the people are low in the first chunk and differ in the second chunk. Among the people who are high in the first chunk but differ in the second chunk, the logarithm of their trait level will be basically the same regardless of the second chunk (because the logarithm suppresses things by the total), so these people will reduce the PGS coefficients rather than increasing the PGS coefficients. When you create the PGS, you include both groups, so the PGS coefficients will be downwards biased relative to γ.
Among the people who are high in the first chunk but differ in the second chunk, the logarithm of their trait level will be basically the same regardless of the second chunk (because the logarithm suppresses things by the total), so these people will reduce the PGS coefficients rather than increasing the PGS coefficients
It would decrease the narrowsense (or additive) heritability, which you can basically think of as the squared length of your coefficient vector, but it wouldn’t decrease the broadsense heritability, which is basically the phenotypic variance in expected trait levels you’d get by shuffling around the genotypes. The missing heritability problem is that when we measure these two heritabilities, the former heritability is lower than the latter.
Why not? Shuffling around the second chunk, while the first chunk is already high, doesn’t do anything, and therefore does not contribute phenotypic variance to broadsense heritability.
Ok, more specifically, the decrease in the narrowsense heritability gets “double-counted” (after you’ve computed the reduced coefficients, those coefficients also get applied to those who are low in the first chunk and not just those who are high, when you start making predictions), whereas the decrease in the broadsense heritability is only single-counted. Since the single-counting represents a genuine reduction while the double-counting represents a bias, it only really makes sense to think of the double-counting as pathological.
Ah… ok I think I see where that’s going. Thanks! (Presumably there exists some standard text about this that one can just link to lol.)
I’m still curious whether this actually happens.… I guess you can have the “propensity” be near its ceiling.… (I thought that didn’t make sense, but I guess you sometimes have the probability of disease for a near-ceiling propensity be some number like 20% rather than 100%?) I guess intuitively it seems a bit weird for a disease to have disjunctive causes like this, but then be able to max out at the risk at 20% with just one of the disjunctive causes? IDK. Likewise personality...
(Presumably there exists some standard text about this that one can just link to lol.)
I don’t think so.
I’m still curious whether this actually happens.… I guess you can have the “propensity” be near its ceiling.… (I thought that didn’t make sense, but I guess you sometimes have the probability of disease for a near-ceiling propensity be some number like 20% rather than 100%?) I guess intuitively it seems a bit weird for a disease to have disjunctive causes like this, but then be able to max out at the risk at 20% with just one of the disjunctive causes? IDK. Likewise personality...
For something like divorce, you could imagine the following causes:
Most common cause is you married someone who just sucks
… but maybe you married a closeted gay person
… or maybe your partner was good but then got cancer and you decided to abandon them rather than support them through the treatment
The genetic propensities for these three things are probably pretty different: If you’ve married someone who just sucks, then a counterfactually higher genetic propensity to marry people who suck might counterfactually lead to having married someone who sucks more, but a counterfactually higher genetic propensity to marry a closeted gay person probably wouldn’t lead to counterfactually having married someone who sucks more, nor have much counterfactual effect on them being gay (because it’s probably a nonlinear thing), so only the genetic propensity to marry someone who sucks matters.
In fact, probably the genetic propensity to marry someone who sucks is inversely related to the genetic propensity to divorce someone who encounters hardship, so the final cause of divorce is probably even more distinct from the first one.
(Presumably there exists some standard text about this that one can just link to lol.)
I don’t think so.
How confident are you / why do you think this? (It seems fairly plausible given what I’ve heard about the field of genomics, but still curious.) E.g. “I have a genomics PhD” or “I talk to geneticists and they don’t really know about this stuff” or “I follow some twitter stuff and haven’t heard anyone talk about this”.
In fact, probably the genetic propensity to marry someone who sucks is inversely related to the genetic propensity to divorce someone who encounters hardship, so the final cause of divorce is probably even more distinct from the first one.
Ok I’m too tired to follow this so I’ll tap out of the thread for now.
Under the condition I mentioned, polygenic scores will tend to focus on the traits that cause the most common kind of depression, while neglecting other kinds. The missing heritability will be due to missing those other kinds.
I don’t get why you think this. It doesn’t seem to make any sense. You’d still notice the effect of variants that cause depression-rare, exactly like if depression-rare was the only kind of depression. How is your ability to detect depression-rare affected by the fact that there’s some genetic depression-common? Depression-common could just as well have been environmentally caused.
I might be being dumb, I just don’t get what you’re saying and don’t have a firm grounding myself.
It doesn’t matter if depression-common is genetic or environmental. Depression-common leads to the genetic difference between your cases and controls to be small along the latent trait axis that causes depression-rare. So the effect gets estimated to be not-that-high. The exact details of how it fails depends on the mathematical method used to estimate the effect.
Ok I think I get what you’re trying to communicate, and it seems true, but I don’t think it’s very relevant to the missing heritability thing. The situation you’re describing applies to the fully linear case too. You’re just saying that if a trait is more polygenic / has more causes with smaller effects, it’s harder to detect relevant causes. Unless I still don’t get what you’re saying.
It kind-of applies to the Bernoulli-sigmoid-linear case that would usually be applied to binary diagnoses (but only because of sample size issues and because they usually perform the regression one variable at a time to reduce computational difficulty), but it doesn’t apply as strongly as it does to the polynomial case, and it doesn’t apply to the purely linear (or exponential-linear) case at all.
If you have a purely linear case, then the expected slope of a genetic variant onto an outcome of interest is proportional to the effect of the genetic variant.
The issue is in the polynomial case, the effect size of one genetic variant depends on the status of other genetic variants within the same term in the sum. Statistics gives you a sort of average effect size, but that average effect size is only going to be accurate for the people with the most common kind of depression.
Taking the logarithm (to linearize the association) scales the derivative down by the reciprocal of the magnitude. So if one of the terms in the sum is really big, all the derivatives get scaled down by a lot. If each of the terms are a product, then the derivative for the big term gets scaled up to cancel out the downscaling, but the small terms do not.
Under the condition I mentioned, polygenic scores will tend to focus on the traits that cause the most common kind of depression, while neglecting other kinds. The missing heritability will be due to missing those other kinds.
Can you please write down the expressions you’re talking about as math? If you’re trying to invoke standard genetics knowledge, I’m not a geneticist and I’m not picking it up from what you’re saying.
Let’s start with the basics: If the outcome f is a linear function of the genes x, that is f(x)=βx, then the effect of each gene is given by the gradient of f, i.e. ∇xf(x)=β. (This is technically a bit sketchy since a genetic variant is discrete while gradients require continuity, but it works well enough as a conceptual approximation for our purposes.) Under this circumstance, we can think of genomic studies as finding β. (This is also technically a bit sketchy because of linkage disequillibrium and such, but it works well enough as a conceptual approximation for our purposes.)
If f isn’t a linear function, then there is no constant β to find. However, the argument for genomic studies still mostly goes through that they can find E[∇xf(x)], it’s just that this expression now denotes a weird mismash effect size that’s not very interpretable.
As you observed, if f is almost-linear, for example if f(x)=eβx, then genomic studies still have good options. The best is probably to measure the genetic influence on logf, as then we get a pretty meaningful coefficient out of it. (If we measured the genetic influence of f without the logarithm, I think under commonly viable assumptions we would get β′i∝eβi−1, but don’t cite me on that.)
The trouble arises when you have deeply nonlinear forms such as f(x)=eβx+eγx. If we take the gradient of this, then the chain rule gives us ∇logf(x)=eβxβ+eγxγeβx+eγx. That is, the two different mechanisms “suppress” each other, so if eβx is usually high, then the γ term would usually be (implicitly!) excluded from the analysis.
Ah. Thank you, this makes sense of what you said earlier. (I / someone could have gotten this from what you had written before, by thinking about it more, probably.)
I agree with your analysis as math.
However, I’m skeptical of the application to the genetics stuff, or at least I don’t see it yet. Specifically, you wrote:
And your argument here says that there’s “gradient interference” between the summed products specifically when one of the summed products is really big. But in the case of disease risk, IIUC the sum-of-products f(x) is something like logits. So translating your argument, it’s like:
In this case, yes the analysis is valid, but it’s not very relevant. For the diseases that people tend to talk about, if there are several substantial disjunctive causes (I mean, the risk is a sum of a few different sub-risks), then they all would show substantial signal in the data. None of them drowns out all the others.
Maybe you just meant to say “In theory this could happen”.
Or am I missing what you’re suggesting? E.g. is there a way for there to be a trait that:
has lots of variation (e.g. lots of sick people and lots of non-sick people), and
it’s genetic, and
it’s a fairly simple functional form like we’ve been discussing,
but you can’t optimize it much by changing a bunch of variants found by looking at some millions of genotype/phenotype pairs?
The original discussion was about how personality traits and social outcomes could behave fundamentally differently from biological traits when it comes to genetics. So this isn’t necessarily meant to apply to disease risks.
Well you brought up depression. But anyway, all my questions apply to personality traits as well.
..… To rephrase / explain how confused I am about what you’re trying to tell me: It kinda sounds like you’re saying “If some trait is strongly determined by one big chunk of genes, then you won’t be able to see how some other chunk affects the trait.”. But this can’t explain missing heritability! In this scenario, none of the heritability is even from the second chunk of genes in the first place! Or am I missing something?
Some of the heritability would be from the second chunk of genes.
To the extent that the heritability is from the second chunk, to that extent the gradient does flow, no?
Why?
Because if some of the heritability is from the second chunk, that means that for some pairs of people, they have roughly the same first chunk but somewhat different second chunks; and they have different traits, due to the difference in second chunks. If some amount of heritability is from the second chunk, then to that extent, there’s a bunch of pairs of people whose trait differences are explained by second chunk differences. If you made a PGS, you’d see these pairs of people and then you’d find out how specifically the second chunk affects the trait.
I could be confused about some really basic math here, but yeah, I don’t see it. Your example for how the gradient doesn’t flow seems to say “the gradient doesn’t flow because the second chunk doesn’t actually affect the trait”.
This only applies if the people are low in the first chunk and differ in the second chunk. Among the people who are high in the first chunk but differ in the second chunk, the logarithm of their trait level will be basically the same regardless of the second chunk (because the logarithm suppresses things by the total), so these people will reduce the PGS coefficients rather than increasing the PGS coefficients. When you create the PGS, you include both groups, so the PGS coefficients will be downwards biased relative to γ.
Wouldn’t this also decrease the heritability?
It would decrease the narrowsense (or additive) heritability, which you can basically think of as the squared length of your coefficient vector, but it wouldn’t decrease the broadsense heritability, which is basically the phenotypic variance in expected trait levels you’d get by shuffling around the genotypes. The missing heritability problem is that when we measure these two heritabilities, the former heritability is lower than the latter.
Why not? Shuffling around the second chunk, while the first chunk is already high, doesn’t do anything, and therefore does not contribute phenotypic variance to broadsense heritability.
Ok, more specifically, the decrease in the narrowsense heritability gets “double-counted” (after you’ve computed the reduced coefficients, those coefficients also get applied to those who are low in the first chunk and not just those who are high, when you start making predictions), whereas the decrease in the broadsense heritability is only single-counted. Since the single-counting represents a genuine reduction while the double-counting represents a bias, it only really makes sense to think of the double-counting as pathological.
Ah… ok I think I see where that’s going. Thanks! (Presumably there exists some standard text about this that one can just link to lol.)
I’m still curious whether this actually happens.… I guess you can have the “propensity” be near its ceiling.… (I thought that didn’t make sense, but I guess you sometimes have the probability of disease for a near-ceiling propensity be some number like 20% rather than 100%?) I guess intuitively it seems a bit weird for a disease to have disjunctive causes like this, but then be able to max out at the risk at 20% with just one of the disjunctive causes? IDK. Likewise personality...
I don’t think so.
For something like divorce, you could imagine the following causes:
Most common cause is you married someone who just sucks
… but maybe you married a closeted gay person
… or maybe your partner was good but then got cancer and you decided to abandon them rather than support them through the treatment
The genetic propensities for these three things are probably pretty different: If you’ve married someone who just sucks, then a counterfactually higher genetic propensity to marry people who suck might counterfactually lead to having married someone who sucks more, but a counterfactually higher genetic propensity to marry a closeted gay person probably wouldn’t lead to counterfactually having married someone who sucks more, nor have much counterfactual effect on them being gay (because it’s probably a nonlinear thing), so only the genetic propensity to marry someone who sucks matters.
In fact, probably the genetic propensity to marry someone who sucks is inversely related to the genetic propensity to divorce someone who encounters hardship, so the final cause of divorce is probably even more distinct from the first one.
How confident are you / why do you think this? (It seems fairly plausible given what I’ve heard about the field of genomics, but still curious.) E.g. “I have a genomics PhD” or “I talk to geneticists and they don’t really know about this stuff” or “I follow some twitter stuff and haven’t heard anyone talk about this”.
Ok I’m too tired to follow this so I’ll tap out of the thread for now.
Thanks again!
I talk to geneticists (mostly on Twitter, or rather now BlueSky) and they don’t really know about this stuff.
Not right now, I’m on my phone. Though also it’s not standard genetics math.
Ok.
I don’t get why you think this. It doesn’t seem to make any sense. You’d still notice the effect of variants that cause depression-rare, exactly like if depression-rare was the only kind of depression. How is your ability to detect depression-rare affected by the fact that there’s some genetic depression-common? Depression-common could just as well have been environmentally caused.
I might be being dumb, I just don’t get what you’re saying and don’t have a firm grounding myself.
It doesn’t matter if depression-common is genetic or environmental. Depression-common leads to the genetic difference between your cases and controls to be small along the latent trait axis that causes depression-rare. So the effect gets estimated to be not-that-high. The exact details of how it fails depends on the mathematical method used to estimate the effect.
Ok I think I get what you’re trying to communicate, and it seems true, but I don’t think it’s very relevant to the missing heritability thing. The situation you’re describing applies to the fully linear case too. You’re just saying that if a trait is more polygenic / has more causes with smaller effects, it’s harder to detect relevant causes. Unless I still don’t get what you’re saying.
It kind-of applies to the Bernoulli-sigmoid-linear case that would usually be applied to binary diagnoses (but only because of sample size issues and because they usually perform the regression one variable at a time to reduce computational difficulty), but it doesn’t apply as strongly as it does to the polynomial case, and it doesn’t apply to the purely linear (or exponential-linear) case at all.
If you have a purely linear case, then the expected slope of a genetic variant onto an outcome of interest is proportional to the effect of the genetic variant.
The issue is in the polynomial case, the effect size of one genetic variant depends on the status of other genetic variants within the same term in the sum. Statistics gives you a sort of average effect size, but that average effect size is only going to be accurate for the people with the most common kind of depression.