Goodhart’s Law starts some other way. It’s not quite right to say:
Superiors want an undefined goal G.
Mathematically speaking, the problem can’t be that G is undefined. If G were really undefined in any absolute sense, then superiors would be indifferent to all possible outcomes, or would choose their utility function literally at random. That rarely happens.
Instead, the problem could be that G is difficult to articulate. It is “undefined” only in the sense that people have had trouble coming up with an explicit verbal definition for it. i know what I want and how to get it, but I don’t know how to communicate that want to you ex ante. For example, maybe I want you (the night shift manager) to page me (the owner) whenever there’s a decision to make that could affect whether our business keeps a client, but I’ve never taken any business classes and don’t quite have the vocab to say that, so instead I say to only page me if it’s “important.” “Important” is vague, but “important’ is just a map, and the map is not the territory.
Alternatively, the problem could be that G is difficult to commit to. I can define my goal in words just fine today, but I know (or you suspect) that later I will be tempted to evaluate you by some other criterion. For example, I would like to give a raise to whichever police officer does the most to keep his beat safe, and, as a thoughtful and experienced police chief, I know exactly what the difference is between a safe neighborhood and an unsafe neighborhood, and I’m happy to explain it to anyone who’s interested. As one of my employees, though, you can’t verify that I’m actually rewarding people for making neighborhoods safe, and not, say, giving raises to people who bring in the most money for drug busts, or who artificially lower their crime statistics, or who give me a kickback. It might make more sense for me to just announce that I’ll pay people based on hours worked and complaints lodged, because that announcement is more verifiable, and thus more credible, so at least I’ll be viewed as evenhanded.
Finally, as you’ve already pointed out, the problem could be that G is difficult or expensive to measure. Alternative measures of GDP that take into account factors like health, leisure, and environmental quality have gotten pretty good about specifying what health is, and it’s easy enough to pass laws that commit agencies to valuing health in a particular way, but it’s expensive to measure health, especially in any broad sense. A physical is $60; an exercise fitness exam is another $45; an STD test runs about $20; a battery of prophylactic tests for cancer and heart disease and so on is another $100 or so; a mental health exam is another $80, and then you multiply all that by the size of a valid random sample and we’re talking real money. In my opinion, it would be money very, very well spent, but one can understand why GDP—which can be measured just by asking the IRS for a copy of its tax receipts—is such a popular metric. It’s cheap to use.
G is a variable. It must be undefined by definition, or it is not a variable. A variable’s definition changes by context, therefore outside of context it is always undefined.
That’s why we use X instead of the number 2 in algebraic formulas. You wouldn’t say 2 − 3 = 8, solve for 2, that’s clearly stupid. You must use the undefined variable X (or any other mathematically irrelevant symbol), and then define it in context of the rest of the formula. Move X to a different formula, and it has a different definition. Isolate X without the context of a formula, and it is always undefined (X = ?).
In this instance, G is a variable without context. We aren’t making nails of a certain size, we are just talking about G and the ways G can be used to create a metric once G is known.
Goodhart’s Law starts some other way. It’s not quite right to say:
Mathematically speaking, the problem can’t be that G is undefined. If G were really undefined in any absolute sense, then superiors would be indifferent to all possible outcomes, or would choose their utility function literally at random. That rarely happens.
Instead, the problem could be that G is difficult to articulate. It is “undefined” only in the sense that people have had trouble coming up with an explicit verbal definition for it. i know what I want and how to get it, but I don’t know how to communicate that want to you ex ante. For example, maybe I want you (the night shift manager) to page me (the owner) whenever there’s a decision to make that could affect whether our business keeps a client, but I’ve never taken any business classes and don’t quite have the vocab to say that, so instead I say to only page me if it’s “important.” “Important” is vague, but “important’ is just a map, and the map is not the territory.
Alternatively, the problem could be that G is difficult to commit to. I can define my goal in words just fine today, but I know (or you suspect) that later I will be tempted to evaluate you by some other criterion. For example, I would like to give a raise to whichever police officer does the most to keep his beat safe, and, as a thoughtful and experienced police chief, I know exactly what the difference is between a safe neighborhood and an unsafe neighborhood, and I’m happy to explain it to anyone who’s interested. As one of my employees, though, you can’t verify that I’m actually rewarding people for making neighborhoods safe, and not, say, giving raises to people who bring in the most money for drug busts, or who artificially lower their crime statistics, or who give me a kickback. It might make more sense for me to just announce that I’ll pay people based on hours worked and complaints lodged, because that announcement is more verifiable, and thus more credible, so at least I’ll be viewed as evenhanded.
Finally, as you’ve already pointed out, the problem could be that G is difficult or expensive to measure. Alternative measures of GDP that take into account factors like health, leisure, and environmental quality have gotten pretty good about specifying what health is, and it’s easy enough to pass laws that commit agencies to valuing health in a particular way, but it’s expensive to measure health, especially in any broad sense. A physical is $60; an exercise fitness exam is another $45; an STD test runs about $20; a battery of prophylactic tests for cancer and heart disease and so on is another $100 or so; a mental health exam is another $80, and then you multiply all that by the size of a valid random sample and we’re talking real money. In my opinion, it would be money very, very well spent, but one can understand why GDP—which can be measured just by asking the IRS for a copy of its tax receipts—is such a popular metric. It’s cheap to use.
I partly disagree. Simple metrics are used in place of complex goals, for good reason; https://www.ribbonfarm.com/2016/06/09/goodharts-law-and-why-measurement-is-hard/
Then the fact that the goal is too simply defined allows flexibility to be abused; https://www.ribbonfarm.com/2016/09/29/soft-bias-of-underspecified-goals/
G is a variable. It must be undefined by definition, or it is not a variable. A variable’s definition changes by context, therefore outside of context it is always undefined.
That’s why we use X instead of the number 2 in algebraic formulas. You wouldn’t say 2 − 3 = 8, solve for 2, that’s clearly stupid. You must use the undefined variable X (or any other mathematically irrelevant symbol), and then define it in context of the rest of the formula. Move X to a different formula, and it has a different definition. Isolate X without the context of a formula, and it is always undefined (X = ?).
In this instance, G is a variable without context. We aren’t making nails of a certain size, we are just talking about G and the ways G can be used to create a metric once G is known.