Could you elaborate on how you see your objection applying to that version? To be honest, I don’t yet see that the hypothesis in point 4 is coherent enough to judge whether your claim would be true of it.
I think that you’re saying that, in a world where fewer observable properties correlate with dumbness, there will be more false positives — i.e. more smart people falsely identified as dumb. Is that right?
I think that you’re saying that, in a world where fewer observable properties correlate with dumbness, there will be more false positives — i.e. more smart people falsely identified as dumb. Is that right?
That is correct. Notice that we could “simulate” such a world by simply ignoring some of the correlates.
I think that you’re saying that, in a world where fewer observable properties correlate with dumbness, there will be more false positives — i.e. more smart people falsely identified as dumb. Is that right?
That is correct.
But that isn’t true in general. It might be true under some additional plausible assumptions, but I haven’t worked out what those assumptions would be.
The following toy model is a counterexample. Suppose that intelligence is measured by a quantity between 0 and 1. People are paid according to their employer’s best guess of their intelligence. (We assume universal employment.) More precisely, the employer computes an expected intelligence E (between 0 and 1) for the employee and then pays that employee at a rate of E utilons-per-hour.
Define “dumb” to mean “intelligence less than 0.5“. Define “smart” to mean “intelligence greater than or equal to 0.5”. Define “treating a smart person as dumb” to mean “paying an employee at a rate less than 0.5 when that employee’s intelligence is greater than or equal to 0.5″.
Now consider the following two possible worlds. In both worlds, intelligence is distributed uniformly, in the sense that the proportion of individuals with intelligence between a and b is b − a. World 1 is a world with no observable correlate for intelligence. World 2 is a world that does have an observable correlate for intelligence. I claim that, in both worlds, half the people are paid below their intelligence, but, in World 2 alone, some smart people are treated as dumb.
In World 1, the employer has no information about the employee’s intelligence, beyond the uniform prior distribution. This yields an expected intelligence of E = 0.5 for each employee, so everyone is paid exactly 0.5 utilons-per-hour. Thus, in World 1, half the people are paid below their intelligence, but no smart people are treated as dumb.
In World 2, the population is split half-and-half into f-people and g-people. Employers know the actual distribution of intelligence among both sub-populations. An employer can identify an employee as an f-person or a g-person with perfect reliability, but the employer knows nothing else about that employee’s intelligence.
The f-people’s intelligence satisfies the distribution f, where
f(x) = 4⁄3 for 0 ≤ x < 1⁄2, and f(x) = 2⁄3 for 1⁄2 ≤ x ≤ 1.
Hence, f-people are dumber on average. If I computed correctly, the f-people have expected intelligence E = 5⁄12. Thus, the f-people are all paid 5⁄12 by their employers. In particular, some smart f-people are treated as dumb.
Meanwhile, the g-people’s intelligence is distributed according to the distribution g, where
g(x) = 2⁄3 for 0 ≤ x < 1⁄2, and g(x) = 4⁄3 for 1⁄2 ≤ x ≤ 1.
Hence, g-people are smarter on average. I compute an expected intelligence of E = 7⁄12 for the g-people.
If N is the total population size, our assumptions say that there are N/2 f-people and N/2 g-people. I compute that the number of f-people paid below their intelligence is 4N/18. I get that the number of g-people paid below their intelligence is 5N/18. Thus, in World 2, half the people are paid below their intelligence, but some smart people are treated as dumb.
Your scenarios implicitly assume that anyone whose expected intelligence is bellow median will get treated as dumb and that this is somehow much much worse then what happens to people whose expected is exactly median. Furthermore, even under this assumption you will find that your example falls apart if there is any way besides race to obtain information correlated with intelligence.
Your scenarios implicitly assume that anyone whose expected intelligence is bellow median will get treated as dumb
Well, actually, I thought that I made this assumption generously explicit. Evidently, you had implicit assumptions behind your claim that taking correlates into account would always lead to fewer false positives. What were these additional assumptions?
and that this is somehow much much worse then what happens to people whose expected is exactly median.
I did not make any assumption quantifying how much worse it is. It need only be marginally worse.
Furthermore, even under this assumption you will find that your example falls apart if there is any way besides race to obtain information correlated with intelligence.
No. I can construct similar counterexamples where there are two observable properties (which you can think of as black/white, male/female), corresponding to four populations (black males, …, white females). You will need to make your assumptions more explicit if you want to rule out these kinds of counterexamples.
I did not make any assumption quantifying how much worse it is. It need only be marginally worse.
Sorry, I was assuming a utility function that summed over the amount of suffering each person experienced. You seem to be using a Rawls-style utility function base on minimizing the suffering of the worst of individual. (BTW, that’s a very stupid function to use in anything outside a very simple toy model.
No. I can construct similar counterexamples where there are two observable properties (which you can think of as black/white, male/female), corresponding to four populations (black males, …, white females).
Only by fiddling with the parameters very precisely.
If your assumption is that people whose expected intelligence is bellow the median (or really the nth percentile for any n) will be treated as dumb, the only way a counter-example like yours can work is by having lots of people exactly tied for the nth percentile. And the more other information is available the more the numbers in the scenario must be jiggered for that to happen.
You will need to make your assumptions more explicit if you want to rule out these kinds of counterexamples.
Sorry, I was assuming a utility function that summed over the amount of suffering each person experienced.
Your claim was that more correlates means fewer false positives. This is an abstract mathematical claim about epistemic probability. Utility functions don’t enter into it, at least not explicitly. It’s a claim about some class of probability distributions and criteria for categorization (“positives”). I’m just trying to figure out what class of distributions and criteria you’re talking about.
My counterexamples show that your claim doesn’t apply in full generality. You now claim that such counterexamples require “fiddling with the parameters very precisely.” I take this to be the claim that all scenarios satisfy your claim, except for some measure-zero subset (with respect to some natural measure). Can you prove this?
the only way a counter-example like yours can work is by having lots of people exactly tied for the nth percentile.
I’m not sure how to make sense of this. It doesn’t seem to reflect an understanding of my example.
I argued in the continuous limit. A measure-zero subset of people are tied for exactly the nth percentile. Recall that I said that “the proportion of individuals with intelligence between a and b is b − a.” So, the proportion of people whose intelligence is exactly tied for any value x is x − x = 0.
Of course, the continuous limit is only an approximation of the discrete reality. But I can find discrete examples where this proportion is arbitrarily small. It’s never “lots” relative to the size of the entire population, if that population is of any significant size.
I meant lots of people tied for the nth percentile in terms of your estimate of their intelligence, which was happening in your scenarios because the amount of information available was discrete and very small.
What you say is true of the counterexamples I’ve described explicitly so far. But it is just an artifact of their being the simplest representatives of their family. I can construct similar counterexamples where the number of subpopulations in World 2 is arbitrarily large, and each subpopulation has a different expected intelligence. The proportion of people tied for any given expected intelligence can be arbitrarily small.
ETA: Also, these counterexamples work even if we redefine “treating smart people as dumb” to mean, “treating someone in the top 1% as if they were in the bottom 1%”. We still have a World 1 where no one smart is treated as dumb, and a World 2 where some smart people are treated as dumb.
I believe that the argument in my previous comment applies to any case satisfying the following.
Assume again that intelligence is measured by a quantity between 0 and 1. Assume there are two worlds, both with a prior distribution p for intelligence applying to the entire population. Furthermore, in the second world, the total population is divided into two equal sub-populations, f-people and g-people, with respective posterior distributions f and g for their intelligence. Assume the following about these distributions:
The support of each distribution p, f, and g is the entire interval [0,1]. That is, these distributions are all nonzero over the entire interval.
The prior distribution p is symmetric about 0.5. That is, p(x) = p(1 − x) for 0 ≤ x ≤ 1.
The distributions f and g are mirror images of each other about x=0.5. That is, f(x) = g(1 − x) for 0 ≤ x ≤ 1.
The expected intelligence for the f-people is below 0.5.
I believe that these assumptions suffice for the conclusions in my previous comment to follow. That is, in both worlds, exactly half the people are paid below their intelligence. But, in World 2 alone, some smart people are treated as dumb.
(Here I use the definitions from my previous comment, which I repeat here for convenience: Each employer computes an expected intelligence E for an employee and then pays that employee at a rate of E utilons-per-hour. “Dumb” means “intelligence less than 0.5“. “Smart” means “intelligence greater than or equal to 0.5”. Finally, “treating a smart person as dumb” means “paying an employee at a rate less than 0.5 when that employee’s intelligence is greater than or equal to 0.5″.)
Just to be clear, my objection applies to that version as well.
Could you elaborate on how you see your objection applying to that version? To be honest, I don’t yet see that the hypothesis in point 4 is coherent enough to judge whether your claim would be true of it.
I think that you’re saying that, in a world where fewer observable properties correlate with dumbness, there will be more false positives — i.e. more smart people falsely identified as dumb. Is that right?
That is correct. Notice that we could “simulate” such a world by simply ignoring some of the correlates.
But that isn’t true in general. It might be true under some additional plausible assumptions, but I haven’t worked out what those assumptions would be.
The following toy model is a counterexample. Suppose that intelligence is measured by a quantity between 0 and 1. People are paid according to their employer’s best guess of their intelligence. (We assume universal employment.) More precisely, the employer computes an expected intelligence E (between 0 and 1) for the employee and then pays that employee at a rate of E utilons-per-hour.
Define “dumb” to mean “intelligence less than 0.5“. Define “smart” to mean “intelligence greater than or equal to 0.5”. Define “treating a smart person as dumb” to mean “paying an employee at a rate less than 0.5 when that employee’s intelligence is greater than or equal to 0.5″.
Now consider the following two possible worlds. In both worlds, intelligence is distributed uniformly, in the sense that the proportion of individuals with intelligence between a and b is b − a. World 1 is a world with no observable correlate for intelligence. World 2 is a world that does have an observable correlate for intelligence. I claim that, in both worlds, half the people are paid below their intelligence, but, in World 2 alone, some smart people are treated as dumb.
In World 1, the employer has no information about the employee’s intelligence, beyond the uniform prior distribution. This yields an expected intelligence of E = 0.5 for each employee, so everyone is paid exactly 0.5 utilons-per-hour. Thus, in World 1, half the people are paid below their intelligence, but no smart people are treated as dumb.
In World 2, the population is split half-and-half into f-people and g-people. Employers know the actual distribution of intelligence among both sub-populations. An employer can identify an employee as an f-person or a g-person with perfect reliability, but the employer knows nothing else about that employee’s intelligence.
The f-people’s intelligence satisfies the distribution f, where
f(x) = 4⁄3 for 0 ≤ x < 1⁄2, and f(x) = 2⁄3 for 1⁄2 ≤ x ≤ 1.
Hence, f-people are dumber on average. If I computed correctly, the f-people have expected intelligence E = 5⁄12. Thus, the f-people are all paid 5⁄12 by their employers. In particular, some smart f-people are treated as dumb.
Meanwhile, the g-people’s intelligence is distributed according to the distribution g, where
g(x) = 2⁄3 for 0 ≤ x < 1⁄2, and g(x) = 4⁄3 for 1⁄2 ≤ x ≤ 1.
Hence, g-people are smarter on average. I compute an expected intelligence of E = 7⁄12 for the g-people.
If N is the total population size, our assumptions say that there are N/2 f-people and N/2 g-people. I compute that the number of f-people paid below their intelligence is 4N/18. I get that the number of g-people paid below their intelligence is 5N/18. Thus, in World 2, half the people are paid below their intelligence, but some smart people are treated as dumb.
Your scenarios implicitly assume that anyone whose expected intelligence is bellow median will get treated as dumb and that this is somehow much much worse then what happens to people whose expected is exactly median. Furthermore, even under this assumption you will find that your example falls apart if there is any way besides race to obtain information correlated with intelligence.
Well, actually, I thought that I made this assumption generously explicit. Evidently, you had implicit assumptions behind your claim that taking correlates into account would always lead to fewer false positives. What were these additional assumptions?
I did not make any assumption quantifying how much worse it is. It need only be marginally worse.
No. I can construct similar counterexamples where there are two observable properties (which you can think of as black/white, male/female), corresponding to four populations (black males, …, white females). You will need to make your assumptions more explicit if you want to rule out these kinds of counterexamples.
Sorry, I was assuming a utility function that summed over the amount of suffering each person experienced. You seem to be using a Rawls-style utility function base on minimizing the suffering of the worst of individual. (BTW, that’s a very stupid function to use in anything outside a very simple toy model.
Only by fiddling with the parameters very precisely.
If your assumption is that people whose expected intelligence is bellow the median (or really the nth percentile for any n) will be treated as dumb, the only way a counter-example like yours can work is by having lots of people exactly tied for the nth percentile. And the more other information is available the more the numbers in the scenario must be jiggered for that to happen.
You will need to make your assumptions more explicit if you want to rule out these kinds of counterexamples.
Your claim was that more correlates means fewer false positives. This is an abstract mathematical claim about epistemic probability. Utility functions don’t enter into it, at least not explicitly. It’s a claim about some class of probability distributions and criteria for categorization (“positives”). I’m just trying to figure out what class of distributions and criteria you’re talking about.
My counterexamples show that your claim doesn’t apply in full generality. You now claim that such counterexamples require “fiddling with the parameters very precisely.” I take this to be the claim that all scenarios satisfy your claim, except for some measure-zero subset (with respect to some natural measure). Can you prove this?
I’m not sure how to make sense of this. It doesn’t seem to reflect an understanding of my example.
I argued in the continuous limit. A measure-zero subset of people are tied for exactly the nth percentile. Recall that I said that “the proportion of individuals with intelligence between a and b is b − a.” So, the proportion of people whose intelligence is exactly tied for any value x is x − x = 0.
Of course, the continuous limit is only an approximation of the discrete reality. But I can find discrete examples where this proportion is arbitrarily small. It’s never “lots” relative to the size of the entire population, if that population is of any significant size.
I meant lots of people tied for the nth percentile in terms of your estimate of their intelligence, which was happening in your scenarios because the amount of information available was discrete and very small.
Okay, good. That makes a lot more sense.
What you say is true of the counterexamples I’ve described explicitly so far. But it is just an artifact of their being the simplest representatives of their family. I can construct similar counterexamples where the number of subpopulations in World 2 is arbitrarily large, and each subpopulation has a different expected intelligence. The proportion of people tied for any given expected intelligence can be arbitrarily small.
ETA: Also, these counterexamples work even if we redefine “treating smart people as dumb” to mean, “treating someone in the top 1% as if they were in the bottom 1%”. We still have a World 1 where no one smart is treated as dumb, and a World 2 where some smart people are treated as dumb.
I believe that the argument in my previous comment applies to any case satisfying the following.
Assume again that intelligence is measured by a quantity between 0 and 1. Assume there are two worlds, both with a prior distribution p for intelligence applying to the entire population. Furthermore, in the second world, the total population is divided into two equal sub-populations, f-people and g-people, with respective posterior distributions f and g for their intelligence. Assume the following about these distributions:
The support of each distribution p, f, and g is the entire interval [0,1]. That is, these distributions are all nonzero over the entire interval.
The prior distribution p is symmetric about 0.5. That is, p(x) = p(1 − x) for 0 ≤ x ≤ 1.
The distributions f and g are mirror images of each other about x=0.5. That is, f(x) = g(1 − x) for 0 ≤ x ≤ 1.
The expected intelligence for the f-people is below 0.5.
I believe that these assumptions suffice for the conclusions in my previous comment to follow. That is, in both worlds, exactly half the people are paid below their intelligence. But, in World 2 alone, some smart people are treated as dumb.
(Here I use the definitions from my previous comment, which I repeat here for convenience: Each employer computes an expected intelligence E for an employee and then pays that employee at a rate of E utilons-per-hour. “Dumb” means “intelligence less than 0.5“. “Smart” means “intelligence greater than or equal to 0.5”. Finally, “treating a smart person as dumb” means “paying an employee at a rate less than 0.5 when that employee’s intelligence is greater than or equal to 0.5″.)