This is an important point, and a useful approach—but I think it addresses only some forms of Goodhart-errors. Specifically, it doesn’t address regime change, where the relationship to known important variables changes after optimization, or causal errors, where the action taken have perverse effects on variables known to be important. (And multiparty Goodhart effects are partly included, but I’m still working on figuring out how to formalize and address them more clearly.)
This is an important point, and a useful approach—but I think it addresses only some forms of Goodhart-errors. Specifically, it doesn’t address regime change, where the relationship to known important variables changes after optimization, or causal errors, where the action taken have perverse effects on variables known to be important. (And multiparty Goodhart effects are partly included, but I’m still working on figuring out how to formalize and address them more clearly.)
>where the relationship to known important variables changes after optimization
If you expect that to happen, and include it in the utility, then you’d start getting more conservative optimisation behaviour.