The problem with using a measure like an IQ score is that if the measure happens to work poorly for one particular person, the consequences can become very unbalanced.
If IQ tests are more effective than other tests, but employers are banned from using IQ tests and have to use the less effective measures instead, their decisions will be more inaccurate. They will hire more poor workers, and more good workers will be unable to get jobs.
But because the measures they do use vary from employer to employer, the effect on the workers will be distributed. If, say, an extra 10% of the good workers can’t get jobs, that will manifest itself as different people unfairly unable to get jobs at different times—overall, the 10% will be distributed among the good applicants such that each one finds it somewhat harder to get a job, but eventually gets one after an increased length of jobless time.
If the employers instead use IQ tests and can reduce this to 5%, that’s great for them. The problem for the workers is that if IQ tests are poor indicators of performance for 5% of the people, that won’t just be 5%, it’ll be the same 5% over and over again. The total number of good-worker-man-years lost to the inaccuracy will be less with IQ tests (since IQ tests are more accurate), but the variance in the effect will be greater; instead of many workers finding it somewhat harder to get jobs, there’ll be a few workers finding it a lot harder to get jobs.
Having such a variance is a really bad thing.
(Of course I made some simplifying assumptions. If IQ tests were permitted, probably not 100% of the employers would use them, but that would reduce the effect, not eliminate it. Also, note that this is a per-industry problem; if all insurance salesmen got IQ tested and nobody else, any prospective insurance salesman who doesn’t do well at IQ tests relative to his intelligence would still find himself chronically unemployed.)
The same, of course, applies to refusing to hire someone based on race, gender, religion, etc.: you can reduce the number of people who steal from you by never hiring blacks, but any black person who isn’t a thief would find himself rejected over and over again, rather than a lot more people getting such rejections but each one only getting them occasionally.
(Before you ask, this does also apply to hiring someone based on college education, but there’s not much we can do about that, and at least you can decide to go get a college education. It’s hard to decide to do better on IQ tests or to not be black.)
I agree that it’s a bad thing that some people are mismeasured, because that’s inefficient. I don’t buy the argument that the concentration makes it worse on anywhere near the same scale.
It’s also worth pointing out that this is a continuum. Dull for a systems analyst is sharp for an accountant, and dull for an accountant is sharp for a salesperson, and dull for a salesperson is sharp for a machinist, and so on. And so if someone with salesperson intelligence doesn’t test well, and so only has machinist scores, then they can get a job as a machinist and outperform their peers, and eventually someone may notice they should be in the office instead of on the shop floor.
Perhaps it’s a deliberate simplification for clarity, but that last paragraph seems to me to assume a one-dimensional oversimplification of how things are.
Suppose Frieda would be a great salesperson: she is enthusiastic and upbeat, she has a good memory for names and faces, etc. But her test scores aren’t good, and she gets hired as a machinist. How much are those good-salesperson characteristics going to help her impress her colleagues in the shop floor? Suppose Fred has similar test scores and also gets hired as a machinist. He is conscientious, has a lot of tolerance for repetitive work, is dextrous and not very prone to repetitive strain injuries. He turns out to be a first-rate machinist. Do you want to send him off to Sales?
Now, it could be that there are people watching the employees on the shop floor and looking out for ones who (even though they may not be great machinists) would do well in sales, accounting, or whatever. But I rather doubt it, and I suspect that a machinist’s work-life doesn’t give a lot of opportunities to be noticed as a good candidate for a job in anything far removed from the shop floor.
The argument works just as well with “the person who is bad at IQ tests gets repeatedly hired for lower-paying jobs” as it does with “the person who is bad at IQ tests gets repeatedly not hired at all”. Back when you were permitted to discriminate in hiring based on race, black people didn’t have absolutely no jobs—they were just hired for jobs that were generally worse. (And people didn’t notice the good blacks should be in the office and promote them at a higher rate to make up for it, either. Rather. they allowed their bias to affect their assessment of how good blacks were at their jobs.)
Edit: There’s also a problem that’s related to the first but where accuracy isn’t involved. Imagine that IQ tests were always accurate for job purposes. 100% of the time; if one person has a higher IQ than another, he has higher performance.
Employers would then start hiring people from the high IQs down. In a limited job market, employers would stop hiring before they reached the bottom. Someone could find himself having 95% of the productivity of someone with a higher IQ score but hired 0% of the time. Again, it’s bad to have people who are hired 0% of the time.
You could solve that by introducing noise into the IQ scores, but of course that is equivalent to not allowing IQ testing and forcing employers to use noisy measures of IQ.
(You could also solve that by allowing employers to hire one person at X% of the salary of another, but employers tend not to do that even for the measures they are allowed to use now.)
I feel like the argument is slicing the problem up and presenting just the worst bits, when we need to consider the net effect on everything. This reminds me of a bioethics debate about testing error and base rate of rare lethal diseases: if five times as many people have disease A than disease B, but they look similar and the tests only offer 80% accuracy,* what should we do if the treatment for A cures those with A but kills those with B, and vice versa?
The ‘shut up and multiply’ answer is “don’t give the tests, just treat everyone for A,” as that spares the cost of the tests and 5/6ths of the population lives. But this is inequitable, since everyone with disease B dies. Another approach is to treat everyone for the disease that they test positive for- but now only 4/5ths of the population lives, and we had to pay for the tests! Is it really worth committing 3% of the population to the graveyard to be more equitable? If one focuses on the poor neglected patients with B, then perhaps, but if one considers patients without regard to group membership, definitely not.
*Obviously, the tests need to be dependent for 80% to be the maximal possible accuracy.
And people didn’t notice the good blacks should be in the office and promote them at a higher rate to make up for it, either.
I don’t know if it’s possible to test this, and specifically it’s not obvious to me that we need racial bias to explain this effect. That is, widespread cognitive stratification in the economic sphere is relatively new (it started taking off in a big way only around ~1950 in the US), and if promotions were generally inefficient, it’s hard to determine how much additional inefficiency race caused.
These comparisons become even harder when there are actually underlying differences in distributions. For example, the difference in mean male and female mathematical ability isn’t very large, but the overwhelming majority of Harvard math professors are male. One might make the case that this is sexism at work, but for people with extreme math talent, what matters much more than the difference in mean is the difference in standard deviation, which is significantly higher for men. If you take math test scores from high schoolers and use them as a measure of the population’s underlying mathematical ability distribution and run the numbers, you predict basically the male-female split that Harvard has, which leaves nothing left for sexism to explain.
That is, widespread cognitive stratification in the economic sphere is relatively new (it started taking off in a big way only around ~1950 in the US),
I’m sceptical. Strenze’s meta-analysis of correlations between IQ and socioeconomic status (operationalized as education, occupational level, or individual income) found no substantial increase in those correlations between 1929 & 2003.
I’m sceptical. Strenze’s meta-analysis of correlations between IQ and socioeconomic status (operationalized as education, occupational level, or individual income) found no substantial increase in those correlations between 1929 & 2003.
That does reduce my confidence, but only slightly. I think the stratification claim is more specific than what they’re testing, but their coarse measure gives an upper bound on how strong the stratification effect could be. (Unfortunately, I don’t have the time to delve into this issue.)
The analogy is poor because the point is that temporary unemployment of the kind you get with a noisy IQ measure is much less harmful than long-term unemployment of the kind you might get with a better measure. Whereas with diseases A and B people die either way and it’s just a question of who/how many.
The analogy is poor because the point is that temporary unemployment of the kind you get with a noisy IQ measure is much less harmful than long-term unemployment of the kind you might get with a better measure.
The analogy is intended to be about reasoning processes, not the decision itself. Complaining that some readily identifiable people are hurt by measure X is a distraction if what you care about is total social welfare: if we can reduce harm by concentrating it, then let us do so!
I also think that, on the object level, replacing “long-term unemployment” with “long-term underemployment” significantly decreases the emotional weight of the argument. I also think that it’s not quite right to claim that the current method is equally inefficient everywhere- the people who test well but don’t school well, for example, are the readily identifiable class who suffer under the current regime.
Complaining that some readily identifiable people are hurt by measure X is a distraction if what you care about is total social welfare:
While it makes sense to care about total social welfare, the calculation showing that the IQ test is better shows that it is better in terms of job-productivity-years. Job-productivity-years is not social welfare, and you can’t just assume that it is.
Furthermore, my complaint is not that the people harmed are readily identifiable, but that it’s the same people being constantly harmed. Having one person out of 100 never have a job is worse than having all 100 people not have jobs 1% of the time. Even if I knew who the 100 people were and didn’t know who the one person is, that wouldn’t change it.
Sure, but I don’t see why you think the current setup is much better on that metric. Someone who consistently flubs interviews is going to be unemployed or underemployed, even though interviews don’t seem to communicate much information about job productivity. If it were an actual lottery, I think the argument that the unemployment is spread evenly across the population would hold some weight, but I think employers have errors that are significantly correlated already, and I’m willing to accept an increase in that correlation in exchange for a decrease in the mean error.
If you take math test scores from high schoolers and use them as a measure of the population’s underlying mathematical ability distribution and run the numbers, you predict basically the male-female split that Harvard has, which leaves nothing left for sexism to explain.
I’ve seen this said before (notably, Larry Summers took a lot of heat for saying it) and it seems like the kind of thing that might well be true, but I’ve never seen the actual numbers. Have you actually done the calculations?
(I haven’t checked his numbers or looked for more mainstream authors, but then again, would you expect to find many papers by prominent authors doing the exact calculation you want, especially post-Sumners?)
You say that as if I’m asking for something specific and unusual, but all I’m actually doing is responding to “If you do the calculations you find X” with “That’s interesting; have you done those calculations or seen someone else do them, then?”.
The problem is, I want to see someone other than La Griffe do the numbers and I’m not happy relying on him.
I don’t know who he is, I haven’t gone through his derivations or math, I don’t know how accurate his models are, he uses a lot of old sources of data like Project Talent (which may or may not be fine, but I don’t have the domain expertise to know), and the one piece of writing of his I’ve really gone through, his ‘smart fraction’ doesn’t seem to hold up too well using updated national IQ data from Lynn (me and Vaniver tried to reproduce his result & update it in some comments on LW).
But the problem is, given the conclusion, I am unlikely ever to see someone from across the ideological spectrum verify that his work is right. (Whatever the accuracy of his own arguments, La Griffe does a good job tearing apart one attempt to prove there is no variance difference, where the woman’s arguments show she either doesn’t understand the issue or is being dishonest.)
Your third link begins with the Griffe taking numbers from Janet Hyde, who is on the opposite end of the spectrum. The difference is that she downplays the magnitude of the standard deviation difference. Isn’t the main concern the source of the numbers, not the calculation? It’s just a normal distribution calculation.
(I don’t actually believe that intelligence is normally distributed, so I don’t believe the argument.)
It’s just a normal distribution calculation. (I don’t actually believe that intelligence is normally distributed, so I don’t believe the argument.)
If you don’t think intelligence is normally distributed, isn’t that a problem for how true his results are and why one might want third-parties’ opinion? And I’m not sure that affects his rank-ordering argument very much; that seems like it might be reasonably insensitive to the exact distribution one might choose.
OK, I understand. (I share your frustration, would count as “from across the ideological spectrum”, and have at least a good subset of the necessary skills, but probably lack the time to try to rectify the deficit myself.)
I got the calculations from La Griffe, linked by gwern in a sibling comment. (For completeness, [1], [2], [3].) I have a vague recollection of checking them myself at some point.
In the IQ example, you can’t shut up and multiply because you’re supposed to multiply utilons, but the calculation showing that the IQ test is better measures job-productivity-years, not utilons. Most people don’t think that utilons are linear with job-productivity-years; for instance, having one person out of 100 permanently unemployed is worse than having every person in the 100 lose 1% of the years they would otherwise have worked. That difference is what makes the IQ test scenario bad.
In the disease example, either you die or you don’t,, so as long as you assign more utilons to not dying than to dying, your utilon assignment doesn’t affect how the two scenarios compare.
(You are of course correct about male professors at Harvard.)
Most people don’t think that utilons are linear with job-productivity-years
Are you talking descriptive or normative?
If descriptive, most people don’t think in terms of utilons at all, and if normative I would like to see some arguments for the assertion that differences in wealth/income generate negative utility.
Please demonstrate that the beliefs of “most people” involve utilons at all.
Huh?
People can have beliefs which imply a comparison of utilons, without those people believing in utilons.
Not to mention that under standard interpretation of utility, it’s NOT summable across different people.
I didn’t invoke “most people” to suggest that utility can be compared among people. I invoked it because you are presumably coming up with these utilon calculations as a way to formalize preexisting beliefs, in which we need to figure out what those preexisting beliefs are and what they imply.
People can have beliefs which imply a comparison of utilons, without those people believing in utilons.
Utilons are not a feature of reality. They are a concept that some people use to think about comparative usefulness of things.
What you are saying is that people who think in terms of utilons can reinterpret other people’s value judgments in these terms. But that’s just a map which redraws another map.
Utilon-less maps do not “imply” utilons.
because you are presumably coming up with these utilon calculations
I am not coming up with utilon calculations. I am explicitly rejecting the the idea that the desirability of complete equality somehow falls out of utilon calculations—primarily because I don’t think you can calculate with utilons in this way.
I am not coming up with utilon calculations. I am explicitly rejecting the the idea that the desirability of complete equality somehow falls out of utilon calculations—primarily because I don’t think you can calculate with utilons in this way.
In that case, your argument is with Vaniver, who thinks we can “shut up and multiply” in deciding what is good for a population, which implicitly means that we will be multiplying utilons across members of a population, and that job-productivity-years are linear with utilons. If you cannot aggregate utilons across people, then nothing said here matters.
I think that if you can’t compare utilons among states of aggregations of people, you can’t make very basic comparisons of a type that pretty much everyone makes. You have to at least have a partial order which allows at least some comparisons.
That sounds like a very… lukewarm assertion. So maybe you can’t make very basic comparisons of a type that pretty much everyone makes?
The basic issue is that you need to have a single metric applied to everything you’re trying to aggregate and I don’t think it works this way with estimates of individual utility. You need to convert utilons into something more universal and that typically ends up being dollars :-/
Most people don’t think that utilons are linear with job-productivity-years; for instance, having one person out of 100 permanently unemployed is worse than having every person in the 100 lose 1% of the years they would otherwise have worked.
This calculation completely neglects the utility generating function of productivity.
This calculation completely neglects the utility generating function of productivity.
No, it doesn’t. That function affects the calculation by increasing the total utilions we attribute to productivity. Unless the increase is infinite, it is still possible for the loss in utility from high variance to outweigh the gain in utility from increased productivity.
Unless the increase is infinite, it is still possible for the loss in utility from high variance to outweigh the gain in utility from increased productivity.
This only works if the main contribution to utility from working consists of the personal fulfillment of the worker rather than the benefits generated by the work.
“Make-work” carries the connotation that the productivity of the worker is less valuable than his pay. “Less valuable than optimum” is not the same as “less valuable than his pay”. Furthermore, “low skills” carries the inapt connotation “very low” (and low-testing doesn’t necessarily imply low skills anyway.)
The problem is that someone who is either marginally less productive, or marginally worse at testing, can find his ability to get a job decreased by an amount all out of proportion to how worse he is, if all employers use the same measure. Ensuring that such people can get jobs isn’t make-work.
Kind of offtopic but regarding male-female intelligence differences—in Britain at least, girls seem to consistently outperform boys in school math exams, which would imply there is a mean difference, in the opposite direction.
It might, but there are subtleties you have to take into account. For example, ceiling effects will hide the claimed effect, and if there’s not enough floor, can even produce a lower mean.
Imagine you have a test of 10 4-multiple-choice questions, male mean = female mean but males have higher variance, and the average student’s score on the test would be 8, so lots of students score a perfect 10 but you would have to be retarded to score <=2. What will the mean by gender look like under this scenario? Since the male variance is higher, there will be several times more near-retarded boys than girls scoring in the lower ranks like 3-4; there will nearly as many normal boys as normal girls with normal scores like 7-9; and the rest will score 10 - but the many more boys than girls who are far out on the tail (are genius at maths) will also score 10 and look like fairly ordinary types. So the dim boys drag down the mean of all boys, the ordinary boys by definition match their girl counterparts, while the geniuses can’t show their stuff and might as well have not been tested at all; and so on net, it looks like the boys perform worse than the girls even though they actually are the same on average and have a higher variance. This is because I invented a test which is able to pick up on the differences among the low-performers (by devoting 7 questions to them) but not among the high-performers (just 2 questions), and this favors the group with the least representation among both tails (females).
And most real-world exams are uninterested in making very fine gradations among the top 1% of students like you need to if you want to answer questions about ‘how many female Fields Medalists—top mathematician in the entire world—should there be?’ because with non-adaptive tests you would have to force the 99% of ordinary people to slog through endless reams of questions they have no idea about. (American schools have no incentive to look because they are not evaluated under No Child Left Behind based on how many world-class students pass through their halls, they’re evaluated on the average student and especially the minorities.)
Other issues include to what extent those exams are based on class grades (the usual situation is boys do worse on grades, better on exams, because grades measure how much you can ingratiate yourself to your teacher by things like sitting still and doing even the most tedious moronic homework each and every time) and whether the exam are being administered after puberty where the increased variance is expected to manifest itself.
Thanks for the explanation. The skill ceiling/floor argument makes sense for GCSEs, but I’m not sure how well it works for A-Levels. Boys only outperform girls at the very very top end, and despite the complaints that the ceiling isn’t high enough, I don’t think it can account for all the discrepancy (he said, remembering his bad stats intuition).
Maybe it’s higher male variance and higher female mean?
Class grades also count for zilch in both, it was all exams last time I checked.
Percent passing is not very informative because those sitting the test have been preselected. According to this spreadsheet, 50% more boys take Maths and more than twice as many boys take further maths. Also, it claims that the A* rate is twice as high for boys, at both levels, though the A rate is the same (which is weird).
(the spreadsheet has several sheets, but the link should go to the correct one—gender)
Boys only outperform girls at the very very top end,
I’m not sure I understand your link. If 43.7% of people score an A and that’s the highest score, then it’s definitely not ‘very very top end’ because that means it has almost zero information about anyone who is above-average (much less the extremes like 1 in 10k). And the Criticism section seems to accuse A-levels of a severe ceiling effect:
It has been suggested by The Department for Education that the high proportion of candidates who obtain grade A makes it difficult for universities to distinguish between the most able candidates.
Incidentally, notice the lowest grade: almost twice as many males as females.
I’m talking about Further Maths. The A grade for that is the only one with more boys than girls. It’s much harder, and only 8,000 people take it compared to 60,000 for the standard Mathematics exam.
Then again, the ceiling still only looks to be the top 6-7% of the people taking math A-Levels. I think you’re right.
Before you ask, this does also apply to hiring someone based on college education, but there’s not much we can do about that,
Yes there is, we can pass laws making it illegal to hire on the basis of college degrees (possibly with an exemption for degrees directly relevant to the job).
and at least you can decide to go get a college education.
You can’t decide to get accepted by an elite college.
It’s hard to decide to do better on IQ tests or to not be black.
Another way to phrase this statement is that there is less motivation to engage in costly signaling. Thus there is less deadweight signaling loss and hence more resources available to utility production.
You can’t decide to get accepted by an elite college.
I was referring to discrimination based on whether you have a college education, not discrimination based on which college education you have.
Discrimination based on eliteness of college doesn’t raise the same sort of problems because employers can’t hire just elite college graduates and nobody else—there aren’t enough of them. After the employers hire all the elite college graduates, the remaining ones go to colleges which are hard to rank against each other (unlike IQ scores, which are numbers and are easy to compare). The employers will in effect select randomly from that remaining pool, so it won’t lead to people in that pool becoming permanently unemployed, or even to just becoming permanently underemployed by large degrees.
Another way to phrase this statement is that there is less motivation to engage in costly signaling.
If I had to choose between black people getting the kind of jobs they got when discrimination against them was permitted, and signalling, I’d decide the signalling is less costly, and so would pretty much everyone else.
If I had to choose between black people getting the kind of jobs they got when discrimination against them was permitted, and signalling, I’d decide the signalling is less costly,
You do realize the signaling, at least in the US, currently involves taking out student loans under terms that boarder on debt peonage.
There was a long period of time between when discrimination against blacks in employment was forbidden, and college prices rose to excessive levels. I doubt that signalling alone can explain the increase in college costs, or that letting employers discriminate based on race or IQ would reduce them. I’d blame it more on other government interference (such as subsidizing loans and making it essentially impossible to discharge loans in bankruptcy).
Furthermore, the situation of black people before the civil rights movement was bad enough that I’d be hard pressed to decide that even being massively in debt for a college loan is worse.
The problem with using a measure like an IQ score is that if the measure happens to work poorly for one particular person, the consequences can become very unbalanced.
If IQ tests are more effective than other tests, but employers are banned from using IQ tests and have to use the less effective measures instead, their decisions will be more inaccurate. They will hire more poor workers, and more good workers will be unable to get jobs.
But because the measures they do use vary from employer to employer, the effect on the workers will be distributed. If, say, an extra 10% of the good workers can’t get jobs, that will manifest itself as different people unfairly unable to get jobs at different times—overall, the 10% will be distributed among the good applicants such that each one finds it somewhat harder to get a job, but eventually gets one after an increased length of jobless time.
If the employers instead use IQ tests and can reduce this to 5%, that’s great for them. The problem for the workers is that if IQ tests are poor indicators of performance for 5% of the people, that won’t just be 5%, it’ll be the same 5% over and over again. The total number of good-worker-man-years lost to the inaccuracy will be less with IQ tests (since IQ tests are more accurate), but the variance in the effect will be greater; instead of many workers finding it somewhat harder to get jobs, there’ll be a few workers finding it a lot harder to get jobs.
Having such a variance is a really bad thing.
(Of course I made some simplifying assumptions. If IQ tests were permitted, probably not 100% of the employers would use them, but that would reduce the effect, not eliminate it. Also, note that this is a per-industry problem; if all insurance salesmen got IQ tested and nobody else, any prospective insurance salesman who doesn’t do well at IQ tests relative to his intelligence would still find himself chronically unemployed.)
The same, of course, applies to refusing to hire someone based on race, gender, religion, etc.: you can reduce the number of people who steal from you by never hiring blacks, but any black person who isn’t a thief would find himself rejected over and over again, rather than a lot more people getting such rejections but each one only getting them occasionally.
(Before you ask, this does also apply to hiring someone based on college education, but there’s not much we can do about that, and at least you can decide to go get a college education. It’s hard to decide to do better on IQ tests or to not be black.)
I agree that it’s a bad thing that some people are mismeasured, because that’s inefficient. I don’t buy the argument that the concentration makes it worse on anywhere near the same scale.
It’s also worth pointing out that this is a continuum. Dull for a systems analyst is sharp for an accountant, and dull for an accountant is sharp for a salesperson, and dull for a salesperson is sharp for a machinist, and so on. And so if someone with salesperson intelligence doesn’t test well, and so only has machinist scores, then they can get a job as a machinist and outperform their peers, and eventually someone may notice they should be in the office instead of on the shop floor.
Perhaps it’s a deliberate simplification for clarity, but that last paragraph seems to me to assume a one-dimensional oversimplification of how things are.
Suppose Frieda would be a great salesperson: she is enthusiastic and upbeat, she has a good memory for names and faces, etc. But her test scores aren’t good, and she gets hired as a machinist. How much are those good-salesperson characteristics going to help her impress her colleagues in the shop floor? Suppose Fred has similar test scores and also gets hired as a machinist. He is conscientious, has a lot of tolerance for repetitive work, is dextrous and not very prone to repetitive strain injuries. He turns out to be a first-rate machinist. Do you want to send him off to Sales?
Now, it could be that there are people watching the employees on the shop floor and looking out for ones who (even though they may not be great machinists) would do well in sales, accounting, or whatever. But I rather doubt it, and I suspect that a machinist’s work-life doesn’t give a lot of opportunities to be noticed as a good candidate for a job in anything far removed from the shop floor.
The argument works just as well with “the person who is bad at IQ tests gets repeatedly hired for lower-paying jobs” as it does with “the person who is bad at IQ tests gets repeatedly not hired at all”. Back when you were permitted to discriminate in hiring based on race, black people didn’t have absolutely no jobs—they were just hired for jobs that were generally worse. (And people didn’t notice the good blacks should be in the office and promote them at a higher rate to make up for it, either. Rather. they allowed their bias to affect their assessment of how good blacks were at their jobs.)
Edit: There’s also a problem that’s related to the first but where accuracy isn’t involved. Imagine that IQ tests were always accurate for job purposes. 100% of the time; if one person has a higher IQ than another, he has higher performance.
Employers would then start hiring people from the high IQs down. In a limited job market, employers would stop hiring before they reached the bottom. Someone could find himself having 95% of the productivity of someone with a higher IQ score but hired 0% of the time. Again, it’s bad to have people who are hired 0% of the time.
You could solve that by introducing noise into the IQ scores, but of course that is equivalent to not allowing IQ testing and forcing employers to use noisy measures of IQ.
(You could also solve that by allowing employers to hire one person at X% of the salary of another, but employers tend not to do that even for the measures they are allowed to use now.)
I feel like the argument is slicing the problem up and presenting just the worst bits, when we need to consider the net effect on everything. This reminds me of a bioethics debate about testing error and base rate of rare lethal diseases: if five times as many people have disease A than disease B, but they look similar and the tests only offer 80% accuracy,* what should we do if the treatment for A cures those with A but kills those with B, and vice versa?
The ‘shut up and multiply’ answer is “don’t give the tests, just treat everyone for A,” as that spares the cost of the tests and 5/6ths of the population lives. But this is inequitable, since everyone with disease B dies. Another approach is to treat everyone for the disease that they test positive for- but now only 4/5ths of the population lives, and we had to pay for the tests! Is it really worth committing 3% of the population to the graveyard to be more equitable? If one focuses on the poor neglected patients with B, then perhaps, but if one considers patients without regard to group membership, definitely not.
*Obviously, the tests need to be dependent for 80% to be the maximal possible accuracy.
I don’t know if it’s possible to test this, and specifically it’s not obvious to me that we need racial bias to explain this effect. That is, widespread cognitive stratification in the economic sphere is relatively new (it started taking off in a big way only around ~1950 in the US), and if promotions were generally inefficient, it’s hard to determine how much additional inefficiency race caused.
These comparisons become even harder when there are actually underlying differences in distributions. For example, the difference in mean male and female mathematical ability isn’t very large, but the overwhelming majority of Harvard math professors are male. One might make the case that this is sexism at work, but for people with extreme math talent, what matters much more than the difference in mean is the difference in standard deviation, which is significantly higher for men. If you take math test scores from high schoolers and use them as a measure of the population’s underlying mathematical ability distribution and run the numbers, you predict basically the male-female split that Harvard has, which leaves nothing left for sexism to explain.
I’m sceptical. Strenze’s meta-analysis of correlations between IQ and socioeconomic status (operationalized as education, occupational level, or individual income) found no substantial increase in those correlations between 1929 & 2003.
That does reduce my confidence, but only slightly. I think the stratification claim is more specific than what they’re testing, but their coarse measure gives an upper bound on how strong the stratification effect could be. (Unfortunately, I don’t have the time to delve into this issue.)
The analogy is poor because the point is that temporary unemployment of the kind you get with a noisy IQ measure is much less harmful than long-term unemployment of the kind you might get with a better measure. Whereas with diseases A and B people die either way and it’s just a question of who/how many.
The analogy is intended to be about reasoning processes, not the decision itself. Complaining that some readily identifiable people are hurt by measure X is a distraction if what you care about is total social welfare: if we can reduce harm by concentrating it, then let us do so!
I also think that, on the object level, replacing “long-term unemployment” with “long-term underemployment” significantly decreases the emotional weight of the argument. I also think that it’s not quite right to claim that the current method is equally inefficient everywhere- the people who test well but don’t school well, for example, are the readily identifiable class who suffer under the current regime.
Long-term underemployment still tends to erode, or at least not build up, one’s skills, reducing that individual’s lifetime productivity.
While it makes sense to care about total social welfare, the calculation showing that the IQ test is better shows that it is better in terms of job-productivity-years. Job-productivity-years is not social welfare, and you can’t just assume that it is.
Furthermore, my complaint is not that the people harmed are readily identifiable, but that it’s the same people being constantly harmed. Having one person out of 100 never have a job is worse than having all 100 people not have jobs 1% of the time. Even if I knew who the 100 people were and didn’t know who the one person is, that wouldn’t change it.
Sure, but I don’t see why you think the current setup is much better on that metric. Someone who consistently flubs interviews is going to be unemployed or underemployed, even though interviews don’t seem to communicate much information about job productivity. If it were an actual lottery, I think the argument that the unemployment is spread evenly across the population would hold some weight, but I think employers have errors that are significantly correlated already, and I’m willing to accept an increase in that correlation in exchange for a decrease in the mean error.
I’ve seen this said before (notably, Larry Summers took a lot of heat for saying it) and it seems like the kind of thing that might well be true, but I’ve never seen the actual numbers. Have you actually done the calculations?
If you just want some calculations, look at La Griffe: http://www.lagriffedulion.f2s.com/women_and_minorities_in_science.htm and http://www.lagriffedulion.f2s.com/math.htm / http://www.lagriffedulion.f2s.com/math2.htm
(I haven’t checked his numbers or looked for more mainstream authors, but then again, would you expect to find many papers by prominent authors doing the exact calculation you want, especially post-Sumners?)
You say that as if I’m asking for something specific and unusual, but all I’m actually doing is responding to “If you do the calculations you find X” with “That’s interesting; have you done those calculations or seen someone else do them, then?”.
The problem is, I want to see someone other than La Griffe do the numbers and I’m not happy relying on him.
I don’t know who he is, I haven’t gone through his derivations or math, I don’t know how accurate his models are, he uses a lot of old sources of data like Project Talent (which may or may not be fine, but I don’t have the domain expertise to know), and the one piece of writing of his I’ve really gone through, his ‘smart fraction’ doesn’t seem to hold up too well using updated national IQ data from Lynn (me and Vaniver tried to reproduce his result & update it in some comments on LW).
But the problem is, given the conclusion, I am unlikely ever to see someone from across the ideological spectrum verify that his work is right. (Whatever the accuracy of his own arguments, La Griffe does a good job tearing apart one attempt to prove there is no variance difference, where the woman’s arguments show she either doesn’t understand the issue or is being dishonest.)
Your third link begins with the Griffe taking numbers from Janet Hyde, who is on the opposite end of the spectrum. The difference is that she downplays the magnitude of the standard deviation difference. Isn’t the main concern the source of the numbers, not the calculation? It’s just a normal distribution calculation.
(I don’t actually believe that intelligence is normally distributed, so I don’t believe the argument.)
If you don’t think intelligence is normally distributed, isn’t that a problem for how true his results are and why one might want third-parties’ opinion? And I’m not sure that affects his rank-ordering argument very much; that seems like it might be reasonably insensitive to the exact distribution one might choose.
OK, I understand. (I share your frustration, would count as “from across the ideological spectrum”, and have at least a good subset of the necessary skills, but probably lack the time to try to rectify the deficit myself.)
I got the calculations from La Griffe, linked by gwern in a sibling comment. (For completeness, [1], [2], [3].) I have a vague recollection of checking them myself at some point.
OK. Thanks.
In the IQ example, you can’t shut up and multiply because you’re supposed to multiply utilons, but the calculation showing that the IQ test is better measures job-productivity-years, not utilons. Most people don’t think that utilons are linear with job-productivity-years; for instance, having one person out of 100 permanently unemployed is worse than having every person in the 100 lose 1% of the years they would otherwise have worked. That difference is what makes the IQ test scenario bad.
In the disease example, either you die or you don’t,, so as long as you assign more utilons to not dying than to dying, your utilon assignment doesn’t affect how the two scenarios compare.
(You are of course correct about male professors at Harvard.)
Are you talking descriptive or normative?
If descriptive, most people don’t think in terms of utilons at all, and if normative I would like to see some arguments for the assertion that differences in wealth/income generate negative utility.
Most people have beliefs which imply a comparison in which utilons are not linear with job-productivity-years.
Please demonstrate that the beliefs of “most people” involve utilons at all.
Not to mention that under standard interpretation of utility, it’s NOT summable across different people.
Huh?
People can have beliefs which imply a comparison of utilons, without those people believing in utilons.
I didn’t invoke “most people” to suggest that utility can be compared among people. I invoked it because you are presumably coming up with these utilon calculations as a way to formalize preexisting beliefs, in which we need to figure out what those preexisting beliefs are and what they imply.
Utilons are not a feature of reality. They are a concept that some people use to think about comparative usefulness of things.
What you are saying is that people who think in terms of utilons can reinterpret other people’s value judgments in these terms. But that’s just a map which redraws another map.
Utilon-less maps do not “imply” utilons.
I am not coming up with utilon calculations. I am explicitly rejecting the the idea that the desirability of complete equality somehow falls out of utilon calculations—primarily because I don’t think you can calculate with utilons in this way.
In that case, your argument is with Vaniver, who thinks we can “shut up and multiply” in deciding what is good for a population, which implicitly means that we will be multiplying utilons across members of a population, and that job-productivity-years are linear with utilons. If you cannot aggregate utilons across people, then nothing said here matters.
While that may or may not be so, what are your opinions on whether you can calculate with utilons in this way?
I think that if you can’t compare utilons among states of aggregations of people, you can’t make very basic comparisons of a type that pretty much everyone makes. You have to at least have a partial order which allows at least some comparisons.
That sounds like a very… lukewarm assertion. So maybe you can’t make very basic comparisons of a type that pretty much everyone makes?
The basic issue is that you need to have a single metric applied to everything you’re trying to aggregate and I don’t think it works this way with estimates of individual utility. You need to convert utilons into something more universal and that typically ends up being dollars :-/
This calculation completely neglects the utility generating function of productivity.
No, it doesn’t. That function affects the calculation by increasing the total utilions we attribute to productivity. Unless the increase is infinite, it is still possible for the loss in utility from high variance to outweigh the gain in utility from increased productivity.
This only works if the main contribution to utility from working consists of the personal fulfillment of the worker rather than the benefits generated by the work.
Only in the sense that any measure of utility that involves the condition of a person consists of their personal fulfillment.
You’re argument essentially amounts to arguing that we should give people with low skills make-work jobs in order to increase utility.
“Make-work” carries the connotation that the productivity of the worker is less valuable than his pay. “Less valuable than optimum” is not the same as “less valuable than his pay”. Furthermore, “low skills” carries the inapt connotation “very low” (and low-testing doesn’t necessarily imply low skills anyway.)
The problem is that someone who is either marginally less productive, or marginally worse at testing, can find his ability to get a job decreased by an amount all out of proportion to how worse he is, if all employers use the same measure. Ensuring that such people can get jobs isn’t make-work.
Is there some reason why most of my posts in this thread are modded down, other than disagreement?
Kind of offtopic but regarding male-female intelligence differences—in Britain at least, girls seem to consistently outperform boys in school math exams, which would imply there is a mean difference, in the opposite direction.
It might, but there are subtleties you have to take into account. For example, ceiling effects will hide the claimed effect, and if there’s not enough floor, can even produce a lower mean.
Imagine you have a test of 10 4-multiple-choice questions, male mean = female mean but males have higher variance, and the average student’s score on the test would be 8, so lots of students score a perfect 10 but you would have to be retarded to score <=2. What will the mean by gender look like under this scenario? Since the male variance is higher, there will be several times more near-retarded boys than girls scoring in the lower ranks like 3-4; there will nearly as many normal boys as normal girls with normal scores like 7-9; and the rest will score 10 - but the many more boys than girls who are far out on the tail (are genius at maths) will also score 10 and look like fairly ordinary types. So the dim boys drag down the mean of all boys, the ordinary boys by definition match their girl counterparts, while the geniuses can’t show their stuff and might as well have not been tested at all; and so on net, it looks like the boys perform worse than the girls even though they actually are the same on average and have a higher variance. This is because I invented a test which is able to pick up on the differences among the low-performers (by devoting 7 questions to them) but not among the high-performers (just 2 questions), and this favors the group with the least representation among both tails (females).
And most real-world exams are uninterested in making very fine gradations among the top 1% of students like you need to if you want to answer questions about ‘how many female Fields Medalists—top mathematician in the entire world—should there be?’ because with non-adaptive tests you would have to force the 99% of ordinary people to slog through endless reams of questions they have no idea about. (American schools have no incentive to look because they are not evaluated under No Child Left Behind based on how many world-class students pass through their halls, they’re evaluated on the average student and especially the minorities.)
Other issues include to what extent those exams are based on class grades (the usual situation is boys do worse on grades, better on exams, because grades measure how much you can ingratiate yourself to your teacher by things like sitting still and doing even the most tedious moronic homework each and every time) and whether the exam are being administered after puberty where the increased variance is expected to manifest itself.
Thanks for the explanation. The skill ceiling/floor argument makes sense for GCSEs, but I’m not sure how well it works for A-Levels. Boys only outperform girls at the very very top end, and despite the complaints that the ceiling isn’t high enough, I don’t think it can account for all the discrepancy (he said, remembering his bad stats intuition).
Maybe it’s higher male variance and higher female mean?
Class grades also count for zilch in both, it was all exams last time I checked.
Percent passing is not very informative because those sitting the test have been preselected. According to this spreadsheet, 50% more boys take Maths and more than twice as many boys take further maths. Also, it claims that the A* rate is twice as high for boys, at both levels, though the A rate is the same (which is weird).
(the spreadsheet has several sheets, but the link should go to the correct one—gender)
I’m not sure I understand your link. If 43.7% of people score an A and that’s the highest score, then it’s definitely not ‘very very top end’ because that means it has almost zero information about anyone who is above-average (much less the extremes like 1 in 10k). And the Criticism section seems to accuse A-levels of a severe ceiling effect:
Incidentally, notice the lowest grade: almost twice as many males as females.
I’m talking about Further Maths. The A grade for that is the only one with more boys than girls. It’s much harder, and only 8,000 people take it compared to 60,000 for the standard Mathematics exam.
Then again, the ceiling still only looks to be the top 6-7% of the people taking math A-Levels. I think you’re right.
Yes there is, we can pass laws making it illegal to hire on the basis of college degrees (possibly with an exemption for degrees directly relevant to the job).
You can’t decide to get accepted by an elite college.
Another way to phrase this statement is that there is less motivation to engage in costly signaling. Thus there is less deadweight signaling loss and hence more resources available to utility production.
I was referring to discrimination based on whether you have a college education, not discrimination based on which college education you have.
Discrimination based on eliteness of college doesn’t raise the same sort of problems because employers can’t hire just elite college graduates and nobody else—there aren’t enough of them. After the employers hire all the elite college graduates, the remaining ones go to colleges which are hard to rank against each other (unlike IQ scores, which are numbers and are easy to compare). The employers will in effect select randomly from that remaining pool, so it won’t lead to people in that pool becoming permanently unemployed, or even to just becoming permanently underemployed by large degrees.
If I had to choose between black people getting the kind of jobs they got when discrimination against them was permitted, and signalling, I’d decide the signalling is less costly, and so would pretty much everyone else.
You do realize the signaling, at least in the US, currently involves taking out student loans under terms that boarder on debt peonage.
There was a long period of time between when discrimination against blacks in employment was forbidden, and college prices rose to excessive levels. I doubt that signalling alone can explain the increase in college costs, or that letting employers discriminate based on race or IQ would reduce them. I’d blame it more on other government interference (such as subsidizing loans and making it essentially impossible to discharge loans in bankruptcy).
Furthermore, the situation of black people before the civil rights movement was bad enough that I’d be hard pressed to decide that even being massively in debt for a college loan is worse.