Well written—I just wish it had been posted when it was first written!
You may be aware of the post by Scott Garrabrant, and the follow-up paper expanding on it, but if you are not, it formalizes some of the different aspects of Goodhart’s law you discussed here. The causal case and the correlational case are not the only ones that matter, and our work differs in that the “not a mango” failure is not really considered a Goodhart’s law issue, but it certainly is an underspecified goal in a way similar to what I discussed on Ribbonfarm here, which I noted leads to Goodhart’s Law issues.
Well written—I just wish it had been posted when it was first written!
You may be aware of the post by Scott Garrabrant, and the follow-up paper expanding on it, but if you are not, it formalizes some of the different aspects of Goodhart’s law you discussed here. The causal case and the correlational case are not the only ones that matter, and our work differs in that the “not a mango” failure is not really considered a Goodhart’s law issue, but it certainly is an underspecified goal in a way similar to what I discussed on Ribbonfarm here, which I noted leads to Goodhart’s Law issues.