This will happen anyway, in fact it will happen more often if relevant information is discarded. The difference is that the victims will no longer be correlated with race. Thus we are still left with the question of why adding race makes it worse.
Edit: Another way to express what I’m trying to say is that your argument, if it works, shows we should avoid using any data that’s correlated but not perfectly correlated with intelligence, e.g., test scores, grades, job performance, pretty much anything really.
Yes, that is a fatal flaw in the above argument. It “proves” way too much, namely that we should be disturbed by the idea that there is any observable property of people whatsoever whose correlation with intelligence is neither zero nor one.
It “proves” way too much, namely that we should be disturbed by the idea that there is any observable property of people whatsoever whose correlation with intelligence is neither zero nor one.
I agree. My footnote was trying to get at the same problem. (Though I’m not sure that Eugine_Nier was making the same point.)
If humans were perfect reasoners, your objection would be valid. But people are irrational, and will tend to over-discriminate based on race if minor discrimination (based on the amount of info race does provide) is socially allowed.
Here’s a thought experiment: have a bunch of typical humans guess the intelligence of a group of subjects, knowing their test scores, grades, and job performance, but not race. Then have the same judges guess the intelligence of a second group of subjects, using all the above info plus race, and the fact that “on average, whites are smarter than blacks” without any quantitative data on by how much*. I consider it likely that the second group, despite having more information, will be less accurate than the first. Therefore, if race-based IQ differences exist, we should try to ignore them unless we know their magnitude and are confident in our own rationality.
“On average, whites are smarter than blacks” is all most people will remember from an article on race and IQ, and they won’t think to look up by how much.
people are irrational, and will tend to over-discriminate based on race if minor discrimination (based on the amount of info race does provide) is socially allowed.
What will happen at various degrees of minor discrimination not being socially allowed?
I don’t have any data to hand on this issue, but there is some optimal amount of disseminated info+set of social norms that maximizes people’s ability to correctly judge others’ merit. Ceteris paribus, let’s go with that.
some optimal amount of disseminated info+set of social norms that maximizes people’s ability to correctly judge others’ merit.
I agree, but that means the optimum norms are a balance rather than to be maximally vigilant hunting down minor discrimination. Since the maximum is wrong, this means that complaints that something is the type of thing that should be socially disallowed are highly suspect.
Yes. I don’t mean that it should be totally disallowed. The optimal thing would be to weight it appropriately. However, it may be that there can’t be any other states in society other than “ban it/make things race-blind” and “allow it/don’t make things race-blind”.
This will happen anyway, in fact it will happen more often if relevant information is discarded. …
your argument, if it works, shows we should avoid using any data that’s correlated but not perfectly correlated with intelligence
Just to be clear, the argument I outlined (but did not endorse) is about why it would be worse if race-based IQ differences existed in fact. Note that the hypothesis in point 3 was “If race-based differences in intelligence exist...”, not, “If we explore the possibility of race-based differences in intelligence...”. The argument doesn’t conclude that we should ignore relevant information. The argument’s conclusion is that, in a “juster world”, racial information wouldn’t be relevant.
(It’s not clear to me whether you meant to imply otherwise, but I thought that I should clarify that point.)
Could you elaborate on how you see your objection applying to that version? To be honest, I don’t yet see that the hypothesis in point 4 is coherent enough to judge whether your claim would be true of it.
I think that you’re saying that, in a world where fewer observable properties correlate with dumbness, there will be more false positives — i.e. more smart people falsely identified as dumb. Is that right?
I think that you’re saying that, in a world where fewer observable properties correlate with dumbness, there will be more false positives — i.e. more smart people falsely identified as dumb. Is that right?
That is correct. Notice that we could “simulate” such a world by simply ignoring some of the correlates.
I think that you’re saying that, in a world where fewer observable properties correlate with dumbness, there will be more false positives — i.e. more smart people falsely identified as dumb. Is that right?
That is correct.
But that isn’t true in general. It might be true under some additional plausible assumptions, but I haven’t worked out what those assumptions would be.
The following toy model is a counterexample. Suppose that intelligence is measured by a quantity between 0 and 1. People are paid according to their employer’s best guess of their intelligence. (We assume universal employment.) More precisely, the employer computes an expected intelligence E (between 0 and 1) for the employee and then pays that employee at a rate of E utilons-per-hour.
Define “dumb” to mean “intelligence less than 0.5“. Define “smart” to mean “intelligence greater than or equal to 0.5”. Define “treating a smart person as dumb” to mean “paying an employee at a rate less than 0.5 when that employee’s intelligence is greater than or equal to 0.5″.
Now consider the following two possible worlds. In both worlds, intelligence is distributed uniformly, in the sense that the proportion of individuals with intelligence between a and b is b − a. World 1 is a world with no observable correlate for intelligence. World 2 is a world that does have an observable correlate for intelligence. I claim that, in both worlds, half the people are paid below their intelligence, but, in World 2 alone, some smart people are treated as dumb.
In World 1, the employer has no information about the employee’s intelligence, beyond the uniform prior distribution. This yields an expected intelligence of E = 0.5 for each employee, so everyone is paid exactly 0.5 utilons-per-hour. Thus, in World 1, half the people are paid below their intelligence, but no smart people are treated as dumb.
In World 2, the population is split half-and-half into f-people and g-people. Employers know the actual distribution of intelligence among both sub-populations. An employer can identify an employee as an f-person or a g-person with perfect reliability, but the employer knows nothing else about that employee’s intelligence.
The f-people’s intelligence satisfies the distribution f, where
f(x) = 4⁄3 for 0 ≤ x < 1⁄2, and f(x) = 2⁄3 for 1⁄2 ≤ x ≤ 1.
Hence, f-people are dumber on average. If I computed correctly, the f-people have expected intelligence E = 5⁄12. Thus, the f-people are all paid 5⁄12 by their employers. In particular, some smart f-people are treated as dumb.
Meanwhile, the g-people’s intelligence is distributed according to the distribution g, where
g(x) = 2⁄3 for 0 ≤ x < 1⁄2, and g(x) = 4⁄3 for 1⁄2 ≤ x ≤ 1.
Hence, g-people are smarter on average. I compute an expected intelligence of E = 7⁄12 for the g-people.
If N is the total population size, our assumptions say that there are N/2 f-people and N/2 g-people. I compute that the number of f-people paid below their intelligence is 4N/18. I get that the number of g-people paid below their intelligence is 5N/18. Thus, in World 2, half the people are paid below their intelligence, but some smart people are treated as dumb.
Your scenarios implicitly assume that anyone whose expected intelligence is bellow median will get treated as dumb and that this is somehow much much worse then what happens to people whose expected is exactly median. Furthermore, even under this assumption you will find that your example falls apart if there is any way besides race to obtain information correlated with intelligence.
Your scenarios implicitly assume that anyone whose expected intelligence is bellow median will get treated as dumb
Well, actually, I thought that I made this assumption generously explicit. Evidently, you had implicit assumptions behind your claim that taking correlates into account would always lead to fewer false positives. What were these additional assumptions?
and that this is somehow much much worse then what happens to people whose expected is exactly median.
I did not make any assumption quantifying how much worse it is. It need only be marginally worse.
Furthermore, even under this assumption you will find that your example falls apart if there is any way besides race to obtain information correlated with intelligence.
No. I can construct similar counterexamples where there are two observable properties (which you can think of as black/white, male/female), corresponding to four populations (black males, …, white females). You will need to make your assumptions more explicit if you want to rule out these kinds of counterexamples.
I did not make any assumption quantifying how much worse it is. It need only be marginally worse.
Sorry, I was assuming a utility function that summed over the amount of suffering each person experienced. You seem to be using a Rawls-style utility function base on minimizing the suffering of the worst of individual. (BTW, that’s a very stupid function to use in anything outside a very simple toy model.
No. I can construct similar counterexamples where there are two observable properties (which you can think of as black/white, male/female), corresponding to four populations (black males, …, white females).
Only by fiddling with the parameters very precisely.
If your assumption is that people whose expected intelligence is bellow the median (or really the nth percentile for any n) will be treated as dumb, the only way a counter-example like yours can work is by having lots of people exactly tied for the nth percentile. And the more other information is available the more the numbers in the scenario must be jiggered for that to happen.
You will need to make your assumptions more explicit if you want to rule out these kinds of counterexamples.
Sorry, I was assuming a utility function that summed over the amount of suffering each person experienced.
Your claim was that more correlates means fewer false positives. This is an abstract mathematical claim about epistemic probability. Utility functions don’t enter into it, at least not explicitly. It’s a claim about some class of probability distributions and criteria for categorization (“positives”). I’m just trying to figure out what class of distributions and criteria you’re talking about.
My counterexamples show that your claim doesn’t apply in full generality. You now claim that such counterexamples require “fiddling with the parameters very precisely.” I take this to be the claim that all scenarios satisfy your claim, except for some measure-zero subset (with respect to some natural measure). Can you prove this?
the only way a counter-example like yours can work is by having lots of people exactly tied for the nth percentile.
I’m not sure how to make sense of this. It doesn’t seem to reflect an understanding of my example.
I argued in the continuous limit. A measure-zero subset of people are tied for exactly the nth percentile. Recall that I said that “the proportion of individuals with intelligence between a and b is b − a.” So, the proportion of people whose intelligence is exactly tied for any value x is x − x = 0.
Of course, the continuous limit is only an approximation of the discrete reality. But I can find discrete examples where this proportion is arbitrarily small. It’s never “lots” relative to the size of the entire population, if that population is of any significant size.
I meant lots of people tied for the nth percentile in terms of your estimate of their intelligence, which was happening in your scenarios because the amount of information available was discrete and very small.
What you say is true of the counterexamples I’ve described explicitly so far. But it is just an artifact of their being the simplest representatives of their family. I can construct similar counterexamples where the number of subpopulations in World 2 is arbitrarily large, and each subpopulation has a different expected intelligence. The proportion of people tied for any given expected intelligence can be arbitrarily small.
ETA: Also, these counterexamples work even if we redefine “treating smart people as dumb” to mean, “treating someone in the top 1% as if they were in the bottom 1%”. We still have a World 1 where no one smart is treated as dumb, and a World 2 where some smart people are treated as dumb.
I believe that the argument in my previous comment applies to any case satisfying the following.
Assume again that intelligence is measured by a quantity between 0 and 1. Assume there are two worlds, both with a prior distribution p for intelligence applying to the entire population. Furthermore, in the second world, the total population is divided into two equal sub-populations, f-people and g-people, with respective posterior distributions f and g for their intelligence. Assume the following about these distributions:
The support of each distribution p, f, and g is the entire interval [0,1]. That is, these distributions are all nonzero over the entire interval.
The prior distribution p is symmetric about 0.5. That is, p(x) = p(1 − x) for 0 ≤ x ≤ 1.
The distributions f and g are mirror images of each other about x=0.5. That is, f(x) = g(1 − x) for 0 ≤ x ≤ 1.
The expected intelligence for the f-people is below 0.5.
I believe that these assumptions suffice for the conclusions in my previous comment to follow. That is, in both worlds, exactly half the people are paid below their intelligence. But, in World 2 alone, some smart people are treated as dumb.
(Here I use the definitions from my previous comment, which I repeat here for convenience: Each employer computes an expected intelligence E for an employee and then pays that employee at a rate of E utilons-per-hour. “Dumb” means “intelligence less than 0.5“. “Smart” means “intelligence greater than or equal to 0.5”. Finally, “treating a smart person as dumb” means “paying an employee at a rate less than 0.5 when that employee’s intelligence is greater than or equal to 0.5″.)
This will happen anyway, in fact it will happen more often if relevant information is discarded. The difference is that the victims will no longer be correlated with race. Thus we are still left with the question of why adding race makes it worse.
Edit: Another way to express what I’m trying to say is that your argument, if it works, shows we should avoid using any data that’s correlated but not perfectly correlated with intelligence, e.g., test scores, grades, job performance, pretty much anything really.
Yes, that is a fatal flaw in the above argument. It “proves” way too much, namely that we should be disturbed by the idea that there is any observable property of people whatsoever whose correlation with intelligence is neither zero nor one.
I agree. My footnote was trying to get at the same problem. (Though I’m not sure that Eugine_Nier was making the same point.)
I was.
If humans were perfect reasoners, your objection would be valid. But people are irrational, and will tend to over-discriminate based on race if minor discrimination (based on the amount of info race does provide) is socially allowed.
Here’s a thought experiment: have a bunch of typical humans guess the intelligence of a group of subjects, knowing their test scores, grades, and job performance, but not race. Then have the same judges guess the intelligence of a second group of subjects, using all the above info plus race, and the fact that “on average, whites are smarter than blacks” without any quantitative data on by how much*. I consider it likely that the second group, despite having more information, will be less accurate than the first. Therefore, if race-based IQ differences exist, we should try to ignore them unless we know their magnitude and are confident in our own rationality.
“On average, whites are smarter than blacks” is all most people will remember from an article on race and IQ, and they won’t think to look up by how much.
What will happen at various degrees of minor discrimination not being socially allowed?
I don’t have any data to hand on this issue, but there is some optimal amount of disseminated info+set of social norms that maximizes people’s ability to correctly judge others’ merit. Ceteris paribus, let’s go with that.
I agree, but that means the optimum norms are a balance rather than to be maximally vigilant hunting down minor discrimination. Since the maximum is wrong, this means that complaints that something is the type of thing that should be socially disallowed are highly suspect.
Yes. I don’t mean that it should be totally disallowed. The optimal thing would be to weight it appropriately. However, it may be that there can’t be any other states in society other than “ban it/make things race-blind” and “allow it/don’t make things race-blind”.
Just to be clear, the argument I outlined (but did not endorse) is about why it would be worse if race-based IQ differences existed in fact. Note that the hypothesis in point 3 was “If race-based differences in intelligence exist...”, not, “If we explore the possibility of race-based differences in intelligence...”. The argument doesn’t conclude that we should ignore relevant information. The argument’s conclusion is that, in a “juster world”, racial information wouldn’t be relevant.
(It’s not clear to me whether you meant to imply otherwise, but I thought that I should clarify that point.)
Just to be clear, my objection applies to that version as well.
Could you elaborate on how you see your objection applying to that version? To be honest, I don’t yet see that the hypothesis in point 4 is coherent enough to judge whether your claim would be true of it.
I think that you’re saying that, in a world where fewer observable properties correlate with dumbness, there will be more false positives — i.e. more smart people falsely identified as dumb. Is that right?
That is correct. Notice that we could “simulate” such a world by simply ignoring some of the correlates.
But that isn’t true in general. It might be true under some additional plausible assumptions, but I haven’t worked out what those assumptions would be.
The following toy model is a counterexample. Suppose that intelligence is measured by a quantity between 0 and 1. People are paid according to their employer’s best guess of their intelligence. (We assume universal employment.) More precisely, the employer computes an expected intelligence E (between 0 and 1) for the employee and then pays that employee at a rate of E utilons-per-hour.
Define “dumb” to mean “intelligence less than 0.5“. Define “smart” to mean “intelligence greater than or equal to 0.5”. Define “treating a smart person as dumb” to mean “paying an employee at a rate less than 0.5 when that employee’s intelligence is greater than or equal to 0.5″.
Now consider the following two possible worlds. In both worlds, intelligence is distributed uniformly, in the sense that the proportion of individuals with intelligence between a and b is b − a. World 1 is a world with no observable correlate for intelligence. World 2 is a world that does have an observable correlate for intelligence. I claim that, in both worlds, half the people are paid below their intelligence, but, in World 2 alone, some smart people are treated as dumb.
In World 1, the employer has no information about the employee’s intelligence, beyond the uniform prior distribution. This yields an expected intelligence of E = 0.5 for each employee, so everyone is paid exactly 0.5 utilons-per-hour. Thus, in World 1, half the people are paid below their intelligence, but no smart people are treated as dumb.
In World 2, the population is split half-and-half into f-people and g-people. Employers know the actual distribution of intelligence among both sub-populations. An employer can identify an employee as an f-person or a g-person with perfect reliability, but the employer knows nothing else about that employee’s intelligence.
The f-people’s intelligence satisfies the distribution f, where
f(x) = 4⁄3 for 0 ≤ x < 1⁄2, and f(x) = 2⁄3 for 1⁄2 ≤ x ≤ 1.
Hence, f-people are dumber on average. If I computed correctly, the f-people have expected intelligence E = 5⁄12. Thus, the f-people are all paid 5⁄12 by their employers. In particular, some smart f-people are treated as dumb.
Meanwhile, the g-people’s intelligence is distributed according to the distribution g, where
g(x) = 2⁄3 for 0 ≤ x < 1⁄2, and g(x) = 4⁄3 for 1⁄2 ≤ x ≤ 1.
Hence, g-people are smarter on average. I compute an expected intelligence of E = 7⁄12 for the g-people.
If N is the total population size, our assumptions say that there are N/2 f-people and N/2 g-people. I compute that the number of f-people paid below their intelligence is 4N/18. I get that the number of g-people paid below their intelligence is 5N/18. Thus, in World 2, half the people are paid below their intelligence, but some smart people are treated as dumb.
Your scenarios implicitly assume that anyone whose expected intelligence is bellow median will get treated as dumb and that this is somehow much much worse then what happens to people whose expected is exactly median. Furthermore, even under this assumption you will find that your example falls apart if there is any way besides race to obtain information correlated with intelligence.
Well, actually, I thought that I made this assumption generously explicit. Evidently, you had implicit assumptions behind your claim that taking correlates into account would always lead to fewer false positives. What were these additional assumptions?
I did not make any assumption quantifying how much worse it is. It need only be marginally worse.
No. I can construct similar counterexamples where there are two observable properties (which you can think of as black/white, male/female), corresponding to four populations (black males, …, white females). You will need to make your assumptions more explicit if you want to rule out these kinds of counterexamples.
Sorry, I was assuming a utility function that summed over the amount of suffering each person experienced. You seem to be using a Rawls-style utility function base on minimizing the suffering of the worst of individual. (BTW, that’s a very stupid function to use in anything outside a very simple toy model.
Only by fiddling with the parameters very precisely.
If your assumption is that people whose expected intelligence is bellow the median (or really the nth percentile for any n) will be treated as dumb, the only way a counter-example like yours can work is by having lots of people exactly tied for the nth percentile. And the more other information is available the more the numbers in the scenario must be jiggered for that to happen.
You will need to make your assumptions more explicit if you want to rule out these kinds of counterexamples.
Your claim was that more correlates means fewer false positives. This is an abstract mathematical claim about epistemic probability. Utility functions don’t enter into it, at least not explicitly. It’s a claim about some class of probability distributions and criteria for categorization (“positives”). I’m just trying to figure out what class of distributions and criteria you’re talking about.
My counterexamples show that your claim doesn’t apply in full generality. You now claim that such counterexamples require “fiddling with the parameters very precisely.” I take this to be the claim that all scenarios satisfy your claim, except for some measure-zero subset (with respect to some natural measure). Can you prove this?
I’m not sure how to make sense of this. It doesn’t seem to reflect an understanding of my example.
I argued in the continuous limit. A measure-zero subset of people are tied for exactly the nth percentile. Recall that I said that “the proportion of individuals with intelligence between a and b is b − a.” So, the proportion of people whose intelligence is exactly tied for any value x is x − x = 0.
Of course, the continuous limit is only an approximation of the discrete reality. But I can find discrete examples where this proportion is arbitrarily small. It’s never “lots” relative to the size of the entire population, if that population is of any significant size.
I meant lots of people tied for the nth percentile in terms of your estimate of their intelligence, which was happening in your scenarios because the amount of information available was discrete and very small.
Okay, good. That makes a lot more sense.
What you say is true of the counterexamples I’ve described explicitly so far. But it is just an artifact of their being the simplest representatives of their family. I can construct similar counterexamples where the number of subpopulations in World 2 is arbitrarily large, and each subpopulation has a different expected intelligence. The proportion of people tied for any given expected intelligence can be arbitrarily small.
ETA: Also, these counterexamples work even if we redefine “treating smart people as dumb” to mean, “treating someone in the top 1% as if they were in the bottom 1%”. We still have a World 1 where no one smart is treated as dumb, and a World 2 where some smart people are treated as dumb.
I believe that the argument in my previous comment applies to any case satisfying the following.
Assume again that intelligence is measured by a quantity between 0 and 1. Assume there are two worlds, both with a prior distribution p for intelligence applying to the entire population. Furthermore, in the second world, the total population is divided into two equal sub-populations, f-people and g-people, with respective posterior distributions f and g for their intelligence. Assume the following about these distributions:
The support of each distribution p, f, and g is the entire interval [0,1]. That is, these distributions are all nonzero over the entire interval.
The prior distribution p is symmetric about 0.5. That is, p(x) = p(1 − x) for 0 ≤ x ≤ 1.
The distributions f and g are mirror images of each other about x=0.5. That is, f(x) = g(1 − x) for 0 ≤ x ≤ 1.
The expected intelligence for the f-people is below 0.5.
I believe that these assumptions suffice for the conclusions in my previous comment to follow. That is, in both worlds, exactly half the people are paid below their intelligence. But, in World 2 alone, some smart people are treated as dumb.
(Here I use the definitions from my previous comment, which I repeat here for convenience: Each employer computes an expected intelligence E for an employee and then pays that employee at a rate of E utilons-per-hour. “Dumb” means “intelligence less than 0.5“. “Smart” means “intelligence greater than or equal to 0.5”. Finally, “treating a smart person as dumb” means “paying an employee at a rate less than 0.5 when that employee’s intelligence is greater than or equal to 0.5″.)