In education, this is one of the criticisms of high-stakes testing: you’ll just get schools teaching to the test, in ways that aren’t correlated to real learning (the test is G*, real knowledge/learning is G). People say the same thing about the SAT and test prep—kids get into better colleges because they paid to learn tricks for answering multiple choice questions. The Wire does a great job of showing the police force’s efforts to “juke the stats” (e.g. counting robberies as larcenies) so that crime statistics (G*) look better even while crime (G) is getting worse. Athletes get criticized for playing for their stats (G*), or trying to pad their stats, instead of playing to win, when the stats are supposed to be a measure of how much a player has contributed to his team’s chances of winning (G). I’m not sure if it’s historically accurate, but I’ve heard that body count (G*) was used by the US as one of the main metrics of success (G) in the Vietnam war, and as a result we ended up with a bunch of dead bodies but a misguided war.
In general, any time you measure something you care about in order to incentivize people, or to hold people accountable, or to keep track of what’s going on, and the thing you measure isn’t exactly the same as the thing that you care about, there’s a risk of figuring out ways to improve the measurement that don’t translate into improvements on the thing that you care about.
In education, this is one of the criticisms of high-stakes testing: you’ll just get schools teaching to the test, in ways that aren’t correlated to real learning (the test is G*, real knowledge/learning is G). People say the same thing about the SAT and test prep—kids get into better colleges because they paid to learn tricks for answering multiple choice questions. The Wire does a great job of showing the police force’s efforts to “juke the stats” (e.g. counting robberies as larcenies) so that crime statistics (G*) look better even while crime (G) is getting worse. Athletes get criticized for playing for their stats (G*), or trying to pad their stats, instead of playing to win, when the stats are supposed to be a measure of how much a player has contributed to his team’s chances of winning (G). I’m not sure if it’s historically accurate, but I’ve heard that body count (G*) was used by the US as one of the main metrics of success (G) in the Vietnam war, and as a result we ended up with a bunch of dead bodies but a misguided war.
In general, any time you measure something you care about in order to incentivize people, or to hold people accountable, or to keep track of what’s going on, and the thing you measure isn’t exactly the same as the thing that you care about, there’s a risk of figuring out ways to improve the measurement that don’t translate into improvements on the thing that you care about.