This is cruxy, because I don’t think that noise/non-error freeness alone of your observations lead to bribing surveyors unless we add in additional assumptions about what that noise/non-error freeness is.
(in particular, simple IID noise/quantum noise likely doesn’t lead to extremal Goodhart/bribing surveyors.)
More generally, the reason I maintain a difference between these 2 failure modes of goodharting, like regressional and extremal goodharting is because they respond differently to decreasing the error.
I suspect that in the limit of 0 error, regressional Goodhart like noisy sensors leading to slight overspending on reducing mosquitos vanishes, whereas extremal Goodhart like bribing surveyors doesn’t vanish Goodhart. More importantly, the error of your sensors being means there’s only a bounded error in how much you can regulate X, and error can’t dominate, while extremal Goodhart like bribing surveyors can make the error dominate.
So I basically disagree with this statement:
Goodharting is robust. That is, the mechanism of Goodharting seems impossible to overcome. Goodharting is just a fact of any control system.
This is cruxy, because I don’t think that noise/non-error freeness alone of your observations lead to bribing surveyors unless we add in additional assumptions about what that noise/non-error freeness is.
(in particular, simple IID noise/quantum noise likely doesn’t lead to extremal Goodhart/bribing surveyors.)
More generally, the reason I maintain a difference between these 2 failure modes of goodharting, like regressional and extremal goodharting is because they respond differently to decreasing the error.
I suspect that in the limit of 0 error, regressional Goodhart like noisy sensors leading to slight overspending on reducing mosquitos vanishes, whereas extremal Goodhart like bribing surveyors doesn’t vanish Goodhart. More importantly, the error of your sensors being means there’s only a bounded error in how much you can regulate X, and error can’t dominate, while extremal Goodhart like bribing surveyors can make the error dominate.
So I basically disagree with this statement:
(Late comment here).