I might be interpretting things wrong, but it seems to me that the paper is doing things the wrong way around. That is, (it seems to me that) the paper sets out to prove that Goodhart’s law is an issue and picks a setting where this will be the case—as opposed to picking a setting, then investigating whether/when Goodhart’s law is an issue.
By this, I don’t mean to say that the paper is bad; it is good. I merely mean to say that we should view it as a nice metaphor that formalises some intuitions about Goodhart’s law, rather than as a model that is “causaly related to how Goodhart’s law works (or doesn’t) in reality”.[1]
Why do I think this? Well, if you look at the assumptions, they say that both the utility function and the costs (constraint function) are strictly increasing in all attributes. First, this is not always how the world works. Second, this means that, by assumption, there will always be tradeoffs, and there will always be issues with Goodhart’s law.
To be clear, I think the analysis of “if we assume that tradeoffs are unavoidable, what happens?” is informative. I would just prefer to be be very clear that the premise is just a hypothetical assumption, and actually one that is false more often than not.
Incidentally, I am trying to come up with a “better” model for “this stuff”, one that would have predictive power over reality. (As opposed to starting out with a clear bottom line.) No solutions yet, but I do have some thoughts. If other people are also actively working on this, I would be happy to talk.
I might be interpretting things wrong, but it seems to me that the paper is doing things the wrong way around. That is, (it seems to me that) the paper sets out to prove that Goodhart’s law is an issue and picks a setting where this will be the case—as opposed to picking a setting, then investigating whether/when Goodhart’s law is an issue.
By this, I don’t mean to say that the paper is bad; it is good. I merely mean to say that we should view it as a nice metaphor that formalises some intuitions about Goodhart’s law, rather than as a model that is “causaly related to how Goodhart’s law works (or doesn’t) in reality”.[1]
Why do I think this? Well, if you look at the assumptions, they say that both the utility function and the costs (constraint function) are strictly increasing in all attributes. First, this is not always how the world works. Second, this means that, by assumption, there will always be tradeoffs, and there will always be issues with Goodhart’s law.
To be clear, I think the analysis of “if we assume that tradeoffs are unavoidable, what happens?” is informative. I would just prefer to be be very clear that the premise is just a hypothetical assumption, and actually one that is false more often than not.
Incidentally, I am trying to come up with a “better” model for “this stuff”, one that would have predictive power over reality. (As opposed to starting out with a clear bottom line.) No solutions yet, but I do have some thoughts. If other people are also actively working on this, I would be happy to talk.