The blog post discussion was not included in the paper both because agreement on these points proved difficult, and because I wanted the paper to be relevant more widely than only for AI risk. The paper was intended to expand thinking about Goodhart-like phenomena to address what I initially saw as a confusion about causal and adversarial Goodhart, and to allow a further paper on adversarial cases I’ve been contemplating for a couple years, and am actively working on again. I was hoping to get the second paper, on Adversarial Goodhart and sufficient metrics, done in time for the prize, but since I did not, I’ll nominate the arxiv paper and the blog post, and I will try to get the sequel blog post and maybe even the paper done in time for round three, if there is one.
I worked with Scott to formalize some of his earlier blog post here; https://arxiv.org/abs/1803.04585 - and wrote a bit more about AI-specific concerns relating to the first three forms in this new lesswrong post: https://www.lesserwrong.com/posts/iK2F9QDZvwWinsBYB/non-adversarial-goodhart-and-ai-risks
The blog post discussion was not included in the paper both because agreement on these points proved difficult, and because I wanted the paper to be relevant more widely than only for AI risk. The paper was intended to expand thinking about Goodhart-like phenomena to address what I initially saw as a confusion about causal and adversarial Goodhart, and to allow a further paper on adversarial cases I’ve been contemplating for a couple years, and am actively working on again. I was hoping to get the second paper, on Adversarial Goodhart and sufficient metrics, done in time for the prize, but since I did not, I’ll nominate the arxiv paper and the blog post, and I will try to get the sequel blog post and maybe even the paper done in time for round three, if there is one.
Accepted! Can you give your email address?
My username @ gmail