I expect that the main problem with Goodhart’s law is that if you strive for an indicator to accurately reflect the state of the world, once the indicator becomes decoupled from the state of the world, it stops reflecting the changes in the world. This is how I interpret the term ‘good,’ which I dislike. People want a thermometer to accurately reflect the patterns they called temperature to better predict the future — if the thermometer doesn’t reflect the temperature, future predictions suffer.
A problem I have with this reinterpretation is that “state of the world” is too broad. In looking at a thermometer, I am not trying to understand the entire world-state (and the thermometer also couldn’t be decoupled from the entire world-state, since it is a part of the world).
A more accurate way to remove “good” would be as follows:
In everyday life, if a human is asked to make a (common, everyday) judgement based on appearances, then the judgement is probably accurate. But if we start optimizing really hard based on their judgement, Goodhart’s Law kicks in.
A problem I have with this reinterpretation is that “state of the world” is too broad. In looking at a thermometer, I am not trying to understand the entire world-state (and the thermometer also couldn’t be decoupled from the entire world-state, since it is a part of the world).
A more accurate way to remove “good” would be as follows:
In everyday life, if a human is asked to make a (common, everyday) judgement based on appearances, then the judgement is probably accurate. But if we start optimizing really hard based on their judgement, Goodhart’s Law kicks in.