Abstract: Metrics are useful for measuring systems and motivating behaviors. Unfortunately, naive application of metrics to a system can distort the system in ways that undermine the original goal. The problem was noted independently by Campbell and Goodhart, and in some forms it is not only common, but unavoidable due to the nature of metrics. There are two distinct but interrelated problems that must be overcome in building better metrics; first, specifying metrics more closely related to the true goals, and second, preventing the recipients from gaming the difference between the reward system and the true goal. This paper describes several approaches to designing metrics, beginning with design considerations and processes, then discussing specific strategies including secrecy, randomization, diversification, and post-hoc specification. Finally, it will discuss important desiderata and the trade-offs involved in each approach.
(Currently working on a rewrite, but feedback on the ideas and anything missing is especially appreciated.)
Cool! I don’t have time to look into this now, but I’m excited to see what you produce in this direction. As you know I’m pretty pessimistic that we can totally solve Goodhart effects, but I do expect we can mitigate them enough that for things other than superintelligent levels of optimization we can do better than we do now.
This is a very important point. I will self-promote and mention my pre-print paper on metric design and avoiding Goodharting (not in the context of AI): https://mpra.ub.uni-muenchen.de/90649/1/MPRA_paper_90649.pdf
Abstract: Metrics are useful for measuring systems and motivating behaviors. Unfortunately, naive application of metrics to a system can distort the system in ways that undermine the original goal. The problem was noted independently by Campbell and Goodhart, and in some forms it is not only common, but unavoidable due to the nature of metrics. There are two distinct but interrelated problems that must be overcome in building better metrics; first, specifying metrics more closely related to the true goals, and second, preventing the recipients from gaming the difference between the reward system and the true goal. This paper describes several approaches to designing metrics, beginning with design considerations and processes, then discussing specific strategies including secrecy, randomization, diversification, and post-hoc specification. Finally, it will discuss important desiderata and the trade-offs involved in each approach.
(Currently working on a rewrite, but feedback on the ideas and anything missing is especially appreciated.)
Cool! I don’t have time to look into this now, but I’m excited to see what you produce in this direction. As you know I’m pretty pessimistic that we can totally solve Goodhart effects, but I do expect we can mitigate them enough that for things other than superintelligent levels of optimization we can do better than we do now.
Agreed on both points.