johnswentworth answers How does Gradient Descent Interact with Goodhart?

johnswentworth 2 Feb 2019 8:06 UTC
7 points
If we want to think about reasonably realistic Goodhart issues, random functions on $R^{n}$ seem like the wrong setting. John Maxwell put it nicely in his answer:
If your proxy consists of something you’re trying to maximize plus unrelated noise that’s roughly constant in magnitude, you’re still best off maximizing the heck out of that proxy, because the very highest value of the proxy will tend to be a point where the noise is high and the thing you’re trying to maximize is also high.
That intuition is easy to formalize: we have our “true” objective $u (x)$ that we want to maximize, but we can only observe $u$ plus some (differentiable) systematic error $ϵ (x)$ . Assuming we don’t have any useful knowledge about that error, the expected value given our information $E [u (x) | (u + ϵ) (x)]$ will still be maximized when $(u + ϵ) (x)$ is maximized. There is no Goodhart.
I’d think about it on a causal DAG instead. In practice, the way Goodhart usually pops up is that we have some deep, complicated causal DAG which determines some output we really want to optimize. We notice that some node in the middle of that DAG is highly predictive of happy outputs, so we optimize for that thing as a proxy. If our proxy were a bottleneck in the DAG—i.e. it’s on every possible path from inputs to output—then that would work just fine. But in practice, there are other nodes in parallel to the proxy which also matter for the output. By optimizing for the proxy, we accept trade-offs which harm nodes in parallel to it, which potentially adds up to net-harmful effect on the output.
For example, there’s the old story about soviet nail factories evaluated on number of nails made, and producing huge numbers of tiny useless nails. We really want to optimize something like the total economic value of nails produced. There’s some complicated causal network leading from the factory’s inputs to the economic value of its outputs. If we pick a specific cross-section of that network, we might find that economic value is mediated by number of nails, size, strength, and so forth. If we then choose number of nails as a proxy, then the factories trade off number of nails against any other nodes in that cross-section. But we’ll also see optimization pressure in the right direction for any nodes which effect number of nails without effecting any of those other variables.
So that at least gives us a workable formalization, but we haven’t really answered the question yet. I’m gonna chew on it some more; hopefully this formulation will be helpful to others.
- Davidmanheim 3 Feb 2019 19:28 UTC
  3 points
  Parent
  Having tried to play with this, I’ll strongly agree that random functions on R^N aren’t a good place to start. But I’ve simulated random nodes in the middle of a causal DAG, or selecting ones for high correlation, and realized that they aren’t particularly useful either; people have some appreciation of causal structure, and they aren’t picking metrics randomly for high correlation—they are simply making mistakes in their causal reasoning, or missing potential ways that the metric can be intercepted. (But I was looking for specific things about how the failures manifested, and I was not thinking about gradient descent, so maybe I’m missing your point.)
  - johnswentworth 3 Feb 2019 20:32 UTC
    7 points
    Parent
    Another piece I’d guess is relevant here is generalized efficient markets. If you generate a DAG and start out with random parameters, then start optimizing for a proxy node right away, then you’re not going to be near any sort of pareto frontier, so trade-offs won’t be an issue. You won’t see a Goodhart effect.
    In practice, most of the systems we deal with already have some optimization pressure. They may not be optimal for our main objective, but they’ll at least be pareto-optimal for any cross-section of nodes. Physically, that’s because people do just fine locally optimizing whatever node they’re in charge of—it’s the nonlocal tradeoffs between distant nodes that are tough to deal with (at least without competitive price mechanisms).
    So if you want to see Goodhart effects, first you have to push up to that pareto frontier. Otherwise, changes applied to optimize the proxy are not going to have systematically negative impact on other nodes in parallel to the proxy; the impacts will just be random.