The point is that V and V’ are both hard to define. U is simple, but without a good definition for V, you won’t be able to get a good V’, and if you do have a good V, you can just optimize that directly.
It seems I didn’t articulate my point clearly. What I was saying is that V and V’ are equally hard to define, yet we all assume that true human values has a Goodhart problem (rather than a reverse Goodhart problem). This can’t be because of the complexity (since the complexity is equal) nor because we are maximising a proxy (because both have the same proxy).
So there is something specific about (our knowledge of) human values which causes us to expect Goodhart problems rather than reverse Goodhart problems. It’s not too hard to think of plausible explanations (fragility of value can be re-expressed in terms of simple underlying variables to get results like this), but it does need explaining. And it might not always be valid (eg if we used different underlying variables, such as the smooth-mins of the ones we previously used, then fragility of value and Goodhart effects are much weaker), so we may need to worry about them less in some circumstances.
Sorry, why are V and V’ equally hard to define? Like if V is “human flourishing” and U is GDP then V’ is “twice GDP minus human flourishing” which is more complicated than V. I guess you’re gonna say “Why not say that V is twice GDP minus human flourishing?”? But my point is: for any particular set U,V, V’, you can’t claim that V and V’ are equally simple, and you can’t claim that V and V’ are equally correlated with U. Right?
You have a true goal, V. Then you take the set of all potential proxies that have an observed correlation with V, let’s call this S(V). By Goodhart’s law, this set has the property that any U∈S(V) will with probability 1 be uncorrelated with V outside the observed domain.
Then you can take the set S(2U−V). This set will have the property that any U′∈S(2U−V) will with probability 1 be uncorrelated with 2U−V outside the observed domain. This is Goodhart’s law, and it still applies.
Your claim is that there is one element, U∈S(2U−V) in particular, which will be (positively) correlated with 2U−V. But such proxies still have probability 0. So how is that anti-Goodhart?
Pairing up V and 2U−V to show equivalence of cardinality seems to be irrelevant, and it’s also weird.2U−V is an element of 2S(V)−V, and this depends on V.
By Goodhart’s law, this set has the property that any U∈S(V) will with probability 1 be uncorrelated with V outside the observed domain.
If we have a collection of variables {v}, and V=max(v), then V is positively correlated in practice with most U expressed simply in terms of the variables.
I’ve seen Goodhart’s law as an observation or a fact of human society—you seem to have a mathematical version of it in mind. Is there a reference for that.
I ended up using mathematical language because I found it really difficult to articulate my intuitions. My intuition told me that something like this had to be true mathematically, but the fact that you don’t seem to know about it makes me consider this significantly less likely.
If we have a collection of variables {v}, and V=max(v), then V is positively correlated in practice with most U expressed simply in terms of the variables.
Yes, but V also happens to be very strongly correlated with most U that are equal to V. That’s where you do the cheating. Goodhart’s law, as I understand it, isn’t a claim about any single proxy-goal pair. That would be equivalent to claiming that “there are no statistical regularities, period”. Rather, it’s a claim about the nature of the set of all potential proxies.
In a Bayesian language, Goodhart’s law sets the prior probability of any seemingly good proxy being a good proxy, which is virtually 0. If you have additional evidence, like knowing that your proxy can be expressed in a simple way using your goal, then obviously the probabilities are going to shift.
And that’s how your V and V′ are different. In the case of V, the selection of U is arbitrary. In the case of V′, the selection of U isn’t arbitrary, because it was already fixed when you selected V′. But again, if you select a seemingly good proxy U′ at random, it won’t be an actually good proxy.
This looks like begging the question.
The whole point of Goodhart is that the second case always applies (barring a discontinuity in the production functions—its possible that trying to maximize U generates a whole new method, which produces far more V than the old way).
You cannot argue against that by assuming a contradictory function into existence (at least, not without some actual examples)
It seems I didn’t articulate my point clearly. What I was saying is that V and V’ are equally hard to define, yet we all assume that true human values has a Goodhart problem (rather than a reverse Goodhart problem). This can’t be because of the complexity (since the complexity is equal) nor because we are maximising a proxy (because both have the same proxy).
So there is something specific about (our knowledge of) human values which causes us to expect Goodhart problems rather than reverse Goodhart problems. It’s not too hard to think of plausible explanations (fragility of value can be re-expressed in terms of simple underlying variables to get results like this), but it does need explaining. And it might not always be valid (eg if we used different underlying variables, such as the smooth-mins of the ones we previously used, then fragility of value and Goodhart effects are much weaker), so we may need to worry about them less in some circumstances.
I think it’s empirical observation. Goodhart looked around, saw in many domains that U diverged from V in a bad way after it became a tracked metric, while seeing no examples of U diverging from a theoretical V’ in a good way, and then minted the “law.”
Upon further analysis, no-one has come up with a counterexample not already covered by the built in exceptions (if U is sufficiently close to V, then maximizing U is fine—eg Moneyball; OR if there is relatively low benefit to perform, agents won’t attempt to maximize U—eg anything using Age as U like senior discounts or school placements)
The world doesn’t just happen to behave in a certain way. The probability that all examples point in a single direction without some actual mechanism causing it is negligible.
V and V’ are symmetric; indeed, you can define V as 2U-V’. Given U, they are as well defined as each other.
The point is that V and V’ are both hard to define. U is simple, but without a good definition for V, you won’t be able to get a good V’, and if you do have a good V, you can just optimize that directly.
It seems I didn’t articulate my point clearly. What I was saying is that V and V’ are equally hard to define, yet we all assume that true human values has a Goodhart problem (rather than a reverse Goodhart problem). This can’t be because of the complexity (since the complexity is equal) nor because we are maximising a proxy (because both have the same proxy).
So there is something specific about (our knowledge of) human values which causes us to expect Goodhart problems rather than reverse Goodhart problems. It’s not too hard to think of plausible explanations (fragility of value can be re-expressed in terms of simple underlying variables to get results like this), but it does need explaining. And it might not always be valid (eg if we used different underlying variables, such as the smooth-mins of the ones we previously used, then fragility of value and Goodhart effects are much weaker), so we may need to worry about them less in some circumstances.
Sorry, why are V and V’ equally hard to define? Like if V is “human flourishing” and U is GDP then V’ is “twice GDP minus human flourishing” which is more complicated than V. I guess you’re gonna say “Why not say that V is twice GDP minus human flourishing?”? But my point is: for any particular set U,V, V’, you can’t claim that V and V’ are equally simple, and you can’t claim that V and V’ are equally correlated with U. Right?
Almost equally hard to define. You just need to define U, which, by assumption, is easy.
You have a true goal, V. Then you take the set of all potential proxies that have an observed correlation with V, let’s call this S(V). By Goodhart’s law, this set has the property that any U∈S(V) will with probability 1 be uncorrelated with V outside the observed domain.
Then you can take the set S(2U−V). This set will have the property that any U′∈S(2U−V) will with probability 1 be uncorrelated with 2U−V outside the observed domain. This is Goodhart’s law, and it still applies.
Your claim is that there is one element, U∈S(2U−V) in particular, which will be (positively) correlated with 2U−V. But such proxies still have probability 0. So how is that anti-Goodhart?
Pairing up V and 2U−V to show equivalence of cardinality seems to be irrelevant, and it’s also weird.2U−V is an element of 2S(V)−V, and this depends on V.
If we have a collection of variables {v}, and V=max(v), then V is positively correlated in practice with most U expressed simply in terms of the variables.
I’ve seen Goodhart’s law as an observation or a fact of human society—you seem to have a mathematical version of it in mind. Is there a reference for that.
I ended up using mathematical language because I found it really difficult to articulate my intuitions. My intuition told me that something like this had to be true mathematically, but the fact that you don’t seem to know about it makes me consider this significantly less likely.
Yes, but V also happens to be very strongly correlated with most U that are equal to V. That’s where you do the cheating. Goodhart’s law, as I understand it, isn’t a claim about any single proxy-goal pair. That would be equivalent to claiming that “there are no statistical regularities, period”. Rather, it’s a claim about the nature of the set of all potential proxies.
In a Bayesian language, Goodhart’s law sets the prior probability of any seemingly good proxy being a good proxy, which is virtually 0. If you have additional evidence, like knowing that your proxy can be expressed in a simple way using your goal, then obviously the probabilities are going to shift.
And that’s how your V and V′ are different. In the case of V, the selection of U is arbitrary. In the case of V′, the selection of U isn’t arbitrary, because it was already fixed when you selected V′. But again, if you select a seemingly good proxy U′ at random, it won’t be an actually good proxy.
This looks like begging the question. The whole point of Goodhart is that the second case always applies (barring a discontinuity in the production functions—its possible that trying to maximize U generates a whole new method, which produces far more V than the old way). You cannot argue against that by assuming a contradictory function into existence (at least, not without some actual examples)
It seems I didn’t articulate my point clearly. What I was saying is that V and V’ are equally hard to define, yet we all assume that true human values has a Goodhart problem (rather than a reverse Goodhart problem). This can’t be because of the complexity (since the complexity is equal) nor because we are maximising a proxy (because both have the same proxy).
So there is something specific about (our knowledge of) human values which causes us to expect Goodhart problems rather than reverse Goodhart problems. It’s not too hard to think of plausible explanations (fragility of value can be re-expressed in terms of simple underlying variables to get results like this), but it does need explaining. And it might not always be valid (eg if we used different underlying variables, such as the smooth-mins of the ones we previously used, then fragility of value and Goodhart effects are much weaker), so we may need to worry about them less in some circumstances.
I think it’s empirical observation. Goodhart looked around, saw in many domains that U diverged from V in a bad way after it became a tracked metric, while seeing no examples of U diverging from a theoretical V’ in a good way, and then minted the “law.” Upon further analysis, no-one has come up with a counterexample not already covered by the built in exceptions (if U is sufficiently close to V, then maximizing U is fine—eg Moneyball; OR if there is relatively low benefit to perform, agents won’t attempt to maximize U—eg anything using Age as U like senior discounts or school placements)
The world doesn’t just happen to behave in a certain way. The probability that all examples point in a single direction without some actual mechanism causing it is negligible.