I think it’s possible to build a Goodharts example on a 2D vector space.
Say you get to choose two parameters x and y. You want to maximize their sum, but you are constrained by x2+y2=2 . Then the maximum is attained when x=y=1. Now assume that y is hard to measure, so you use x as a proxy. Then you move from the optimal point we had above to the worse situation where x=√2, but y=0.
The key point being that you are searching for a solution in a manifold inside your vector, but since some dimensions of that vector space are too hard or even impossible to measure, you end up in sub optimal points of your manifold.
In formal terms you have a true utility function u(v) based on all the data v you have, and a misaligned utility function u′(v′) based on the subspace of known variables v′, where u′ could be obtained by integrating out the unknown dimensions if we know their probability distribution, or any other technique that might be more suitable.
Would this count as a more substantive assumption?
Assuming you mean x2+y2<1, optimizing for x+y, and using x as the proxy, this is a pretty nice formulation. Then, increasing x will improve the objective over most of the space, until we run into the boundary (a.k.a the pareto frontier), and then Goodhart kicks in. That’s actually a really clean, simple formulation.
Note: The LaTeX is not rendering properly on this reply. Does anyone know what the reason could be?
I chose x2+y2=2 because the optimal point in that case is the set of integers x=y=1, but the argument holds for any positive real constant, and by using either equality, less than or not greater than.
There is one thing we assumed which is that, given the utility function x+y, our proxy utility function is x .This is not necessarily obvious, and even more so if we think of more convoluted utility functions: if our utility was given by u(x,y)=xy, what would be our proxy when we only know x?
To answer this question generally my first thought would be to build a function T that maps a vector space V, a utility function u:V→R+, the manifold S of possible points and a map from those points s∈S to a filtration Fs that tells us the information we have available when at point s to a new utility function u'.
However this full generality seems a lot harder to describe.
The problem with x2+y2=1 is that it’s not clear why x would seem like a good proxy in the first place. With an inequality constraint, x has positive correlation with the objective everywhere except the boundary. You get at this idea with u(x,y)=xy knowing only x, but I think it’s more a property of dimensionality than of objective complexity—even with a complicated objective, it’s usually easy to tell how to change a single variable to improve the objective if everything else is held constant.
It’s the “held constant” part that really matters—changing one variable while holding all else constant only makes sense in the interior of the set, so it runs into Goodhart-type tradeoffs once you hit the boundary. But you still need the interior in order for the proxy to look good in the first place.
I think it’s possible to build a Goodharts example on a 2D vector space.
Say you get to choose two parameters x and y. You want to maximize their sum, but you are constrained by x2+y2=2 . Then the maximum is attained when x=y=1. Now assume that y is hard to measure, so you use x as a proxy. Then you move from the optimal point we had above to the worse situation where x=√2, but y=0.
The key point being that you are searching for a solution in a manifold inside your vector, but since some dimensions of that vector space are too hard or even impossible to measure, you end up in sub optimal points of your manifold.
In formal terms you have a true utility function u(v) based on all the data v you have, and a misaligned utility function u′(v′) based on the subspace of known variables v′, where u′ could be obtained by integrating out the unknown dimensions if we know their probability distribution, or any other technique that might be more suitable.
Would this count as a more substantive assumption?
Best, Miguel
Edit: added the “In formal terms” paragraph
Assuming you mean x2+y2<1, optimizing for x+y, and using x as the proxy, this is a pretty nice formulation. Then, increasing x will improve the objective over most of the space, until we run into the boundary (a.k.a the pareto frontier), and then Goodhart kicks in. That’s actually a really clean, simple formulation.
Note: The LaTeX is not rendering properly on this reply. Does anyone know what the reason could be?
I chose x2+y2=2 because the optimal point in that case is the set of integers x=y=1, but the argument holds for any positive real constant, and by using either equality, less than or not greater than.
There is one thing we assumed which is that, given the utility function x+y, our proxy utility function is x .This is not necessarily obvious, and even more so if we think of more convoluted utility functions: if our utility was given by u(x,y)=xy, what would be our proxy when we only know x?
To answer this question generally my first thought would be to build a function T that maps a vector space V, a utility function u:V→R+, the manifold S of possible points and a map from those points s∈S to a filtration Fs that tells us the information we have available when at point s to a new utility function u'.
However this full generality seems a lot harder to describe.
Best, Miguel
The problem with x2+y2=1 is that it’s not clear why x would seem like a good proxy in the first place. With an inequality constraint, x has positive correlation with the objective everywhere except the boundary. You get at this idea with u(x,y)=xy knowing only x, but I think it’s more a property of dimensionality than of objective complexity—even with a complicated objective, it’s usually easy to tell how to change a single variable to improve the objective if everything else is held constant.
It’s the “held constant” part that really matters—changing one variable while holding all else constant only makes sense in the interior of the set, so it runs into Goodhart-type tradeoffs once you hit the boundary. But you still need the interior in order for the proxy to look good in the first place.
Fixed the LaTeX for you. You were in WYSIWYG editor mode, where you type LaTeX by pressing CTRL/CMD and 4 at the same time.
Thank you habryka!