First question: do you actually care about correlation (given that it’s a linear metric) or do you mean some tight dependency, not necessarily linear?
If g(y) is monotonic , then the degree to which there’s a right dependency is independent of g(y), which is just a change of coordinates. I do want to chose g(y) maximize the degree to which the dependency is a linear one.
Second question: if that is the case, don’t you want your function g to produce a distribution shaped similarly to the “thing you care about”? If that thing-you-care-about is distributed normally, you would prefer g to generate a normal(-looking) distribution, if it’s distributed, say, as chi-squared, you would prefer g to give you a chi-squared shape, etc...?
Yes, this is true and a good point, though the distribution of “the thing we care about” will vary from thing to thing, and I think that if we have to used a fixed distribution for IQ that’s uniform over all of them, the log of a fat-tailed distribution is probably the best choice.
are you saying income is best modeled as a weighted average of many IQ-related genes?
Here I’m just adopting an Occamistic approach – I don’t have high confidence – I’m just using a linear model because it’s the simplest possible function from genes to outcomes that are correlated with IQ. Feel free to suggest an alternative.
I agree that assortative mating would lead to a fat-tailed distribution, but your original goal was make IQ correlate with “things you care about” and for that purpose the fat tails are not particularly relevant.
Suppose, hypothetically, that human brains were such that IQ was capped at 145 by present day standards (e.g. because unbeknownst to us babies with IQ above that threshold died in childbirth for some reason having to do with IQ genes) . Then if we were to choose g(y) to get a normal distribution, it would look like the correlation between IQ and real world outcomes vanishes after 145, whereas the actual situation would be that the people who scored above 144 have essentially the same genetic composition (with respect to IQ) as the people who scored 144, so that “IQ doesn’t yield returns past 145” would be connotatively misleading.
I’m saying that defining IQ so that it’s normally distributed has a similar (though much smaller) connotatively distortionary effect similar to this one.
Suppose, hypothetically, that human brains were such that IQ was capped at 145 … it would look like the correlation between IQ and real world outcomes vanishes after 145
In your hypothetical there would be a lot of warning signs—for example all IQs above 145 would be random, that is, re-testing IQs above 145 would produce a random draw from the appropriate distribution tail.
And I suspect that it should be possible to figure out real-world distributions (the fatness of the tails, in particular), by looking at raw, non-normalized test scores.
If g(y) is monotonic , then the degree to which there’s a right dependency is independent of g(y), which is just a change of coordinates. I do want to chose g(y) maximize the degree to which the dependency is a linear one.
Yes, this is true and a good point, though the distribution of “the thing we care about” will vary from thing to thing, and I think that if we have to used a fixed distribution for IQ that’s uniform over all of them, the log of a fat-tailed distribution is probably the best choice.
Here I’m just adopting an Occamistic approach – I don’t have high confidence – I’m just using a linear model because it’s the simplest possible function from genes to outcomes that are correlated with IQ. Feel free to suggest an alternative.
Suppose, hypothetically, that human brains were such that IQ was capped at 145 by present day standards (e.g. because unbeknownst to us babies with IQ above that threshold died in childbirth for some reason having to do with IQ genes) . Then if we were to choose g(y) to get a normal distribution, it would look like the correlation between IQ and real world outcomes vanishes after 145, whereas the actual situation would be that the people who scored above 144 have essentially the same genetic composition (with respect to IQ) as the people who scored 144, so that “IQ doesn’t yield returns past 145” would be connotatively misleading.
I’m saying that defining IQ so that it’s normally distributed has a similar (though much smaller) connotatively distortionary effect similar to this one.
In your hypothetical there would be a lot of warning signs—for example all IQs above 145 would be random, that is, re-testing IQs above 145 would produce a random draw from the appropriate distribution tail.
And I suspect that it should be possible to figure out real-world distributions (the fatness of the tails, in particular), by looking at raw, non-normalized test scores.
Yes, you and I are on the same page, I was just saying that IQ shouldn’t be defined to be normally distributed.