I like the idea of clearly showing the core of the problem using a graphical approach, namely how the different base rates keep us from having both kinds of fairness.
There is one glitch, I’m afraid: It seems you got the notion of calibration wrong. In your way of using the word, an ideal calibration would be a perfect score, i.e. a score that outputs 1 for all the true positives and 0 for all the true negatives. While perfect scores play a certain role in Kleinberg et al’s paper as an unrealistic corner case of their theorem, the standard notion of calibration is a different one: It demands that when you look at a score bracket (the set of all people having approximately the same score), the actual fraction of positive instances in this group should (approximately) coincide with the score value in this bracket. To avoid discrimination, one also checks that this is true for white and for black defendants separately.
Fortunately, your approach still works with this definition. In your drawing, it translates into the demand that, in each of the two squares, the yellow area must be as large as the left column (the actual positives). Assume that this is the case in the upper drawing. When we go from the upper to the lower drawing, the boundary between the left and right column moves to the right, as the base rate is higher among blacks. This is nicely indicated with the red arrows in the lower drawing. So the area of the left column increases. But of this newly acquired territory of the left column, only a part is also a new part of the yellow area. Another part was yellow and stays yellow, and a third part is now in the left column, but not part of the yellow area. Hence, in the lower drawing, the left column is larger than the yellow area.
I like the idea of clearly showing the core of the problem using a graphical approach, namely how the different base rates keep us from having both kinds of fairness.
There is one glitch, I’m afraid: It seems you got the notion of calibration wrong. In your way of using the word, an ideal calibration would be a perfect score, i.e. a score that outputs 1 for all the true positives and 0 for all the true negatives. While perfect scores play a certain role in Kleinberg et al’s paper as an unrealistic corner case of their theorem, the standard notion of calibration is a different one: It demands that when you look at a score bracket (the set of all people having approximately the same score), the actual fraction of positive instances in this group should (approximately) coincide with the score value in this bracket. To avoid discrimination, one also checks that this is true for white and for black defendants separately.
Fortunately, your approach still works with this definition. In your drawing, it translates into the demand that, in each of the two squares, the yellow area must be as large as the left column (the actual positives). Assume that this is the case in the upper drawing. When we go from the upper to the lower drawing, the boundary between the left and right column moves to the right, as the base rate is higher among blacks. This is nicely indicated with the red arrows in the lower drawing. So the area of the left column increases. But of this newly acquired territory of the left column, only a part is also a new part of the yellow area. Another part was yellow and stays yellow, and a third part is now in the left column, but not part of the yellow area. Hence, in the lower drawing, the left column is larger than the yellow area.