2. Goodhart’s Law as we deal with it in AI Safety: When a measure becomes a target, it actively causes you to miss the target. AKA, proxies of utility functions, when optimized too aggressively, will reduce the value outputs of those utility functions from where they were originally.
The traditional Goodhart’s Law strikes me as pretty general over a broad range of agents trying to optimize things.
The AI Safety version strikes me as pretty common too. But it’s contingent on the agent’s relationship with the universe in a way that the traditional version is not (ie, that agent already having an unusually high utility function relative to what you’d expect from the universe they’re in)
I just wanted to add that, technically speaking, there are two levels of Goodhart’s Law worth discussing here:
1. Goodhart’s Law as traditionally defined: “When a measure becomes a target, it ceases to be a good measure.” AKA, proxies of utility functions, when optimized too aggressively, stop being proxies for those utility functions.
2. Goodhart’s Law as we deal with it in AI Safety: When a measure becomes a target, it actively causes you to miss the target. AKA, proxies of utility functions, when optimized too aggressively, will reduce the value outputs of those utility functions from where they were originally.
The traditional Goodhart’s Law strikes me as pretty general over a broad range of agents trying to optimize things.
The AI Safety version strikes me as pretty common too. But it’s contingent on the agent’s relationship with the universe in a way that the traditional version is not (ie, that agent already having an unusually high utility function relative to what you’d expect from the universe they’re in)
Yep, those are the two levels I mentioned :-)
But I like your phrasing.