Isnasene comments on All I know is Goodhart

Isnasene 23 Oct 2019 1:03 UTC
1 point
This is only true for the kind of things humans typically care about
There are utility functions for which Goodhart doesn’t apply but I think it’s more generally agent-centric than just human-centric. I think that the vast majority of proxies developed by agents for the sake of optimizing a harder-to-measure function will experience Goodhart (and, to me, the term “utility function” and “proxy” imply that this is what’s happening).
Moreover, I think that Goodhart only doesn’t apply in the case where making the proxy function arbitrarily large doesn’t also change the behavior of the observed universe an arbitrary amount. You can define a utility function for which this is true but the ones that I’ve thought of so far are associated with weird discontinuities.
The mathematical spitballing I did in making this claim:
If we have a utility function U and a proxy utility V that represents U, we expect the plot of coordinates (x=V(World State), y=U(World State) to be roughly sublinear since
- (x=V(World State), y=U(World State) is upper-bounded by (x=U(World State), y=U(World State) or y=x
- we can re-scale V however we want so increasing a unit of V corresponds to an increasing a unit of U at some arbitrary location on the (rough) curve
This indicates that, if the noisiness of the relationship between V and U increases as a linear or superlinear function of V, it could wash-out any positive effects of increasing V. Since the noise also cannot actually improve the performance of V above the upperbound U, the symmetry of this noise as something that may improve performance is broken and leads to an overall downtrend in utility as V is increased more and more.
When would one expect a linear/superlinear increase in the V vs U to actually happen? You might expect this if
1. The proxy V was built to model U based on measurements of their relationship that all occurred in a kind of environment with V in a certain domain.
2. Increasing V outside that domain changes the environment in a way that makes it so different from what it once was that the earlier measurements used to make V don’t apply.
Note that Goodhart doesn’t say when optimizing V starts to decrease U, just that it will at some point. In my opinion, the claim that 1 and 2 will never happen as V increases is stronger than the claim that they will.