The initial section seems very closely related to the work Scott did and we wrote up in the “Three Causal Goodhart Effects” of this paper; https://arxiv.org/pdf/1803.04585.pdf, including some of the causal graphs.
Regarding mitigations, see my preprint, here: https://mpra.ub.uni-muenchen.de/98288/, which in addition to some of the mitigations you discussed, also suggest secret metrics, randomization, and post-hoc specification as strategies. Clearly, these don’t always apply in AI systems, but can potentially be useful in at least some cases.
I think causal diagrams naturally emerge when thinking about Goodhart’s law and its implications.
I came up with the concept of Goodhart’s law causal graphs above because of a presentation someone gave at the EA Hotel in late 2019 of Scott’s Goodhart Taxonomy. I thought causal diagrams were a clearer way to describe some parts of the taxonomy but their relationship to the taxonomy is complex. I also just encountered the paper you and Scott wrote a couple weeks ago when getting ready to write this Good Heart Week prompted post, and I was planning in the next post to reference it when we address “causal stomping” and “function generalization error” and can more comprehensively describe the relationship with the paper.
In terms of the relationship to the paper, I think that the Goodhart’s law causal graphs I describe above are more fundamental and atomically describe the relationship types between the target and proxies in a unified way. I read how you were using causal diagrams in your paper as rather describing various ways causal graph relationships may be broken by taking action rather than simply describing relationships between proxies and targets and ways they may be confused with each other (which is the function of the Goodhart’s law causal graphs above).
Mostly the purpose of this post and the next are to present an alternative, and I think cleaner, ontological structure for thinking about Goodhart’s law though there will still be some messiness in carving up reality.
As to your suggested mitigations, both randomization and secret metric are good to add though I’m not as sure about post hoc. Thanks for the suggestions and the surrounding paper.
The initial section seems very closely related to the work Scott did and we wrote up in the “Three Causal Goodhart Effects” of this paper; https://arxiv.org/pdf/1803.04585.pdf, including some of the causal graphs.
Regarding mitigations, see my preprint, here: https://mpra.ub.uni-muenchen.de/98288/, which in addition to some of the mitigations you discussed, also suggest secret metrics, randomization, and post-hoc specification as strategies. Clearly, these don’t always apply in AI systems, but can potentially be useful in at least some cases.
I think causal diagrams naturally emerge when thinking about Goodhart’s law and its implications.
I came up with the concept of Goodhart’s law causal graphs above because of a presentation someone gave at the EA Hotel in late 2019 of Scott’s Goodhart Taxonomy. I thought causal diagrams were a clearer way to describe some parts of the taxonomy but their relationship to the taxonomy is complex. I also just encountered the paper you and Scott wrote a couple weeks ago when getting ready to write this Good Heart Week prompted post, and I was planning in the next post to reference it when we address “causal stomping” and “function generalization error” and can more comprehensively describe the relationship with the paper.
In terms of the relationship to the paper, I think that the Goodhart’s law causal graphs I describe above are more fundamental and atomically describe the relationship types between the target and proxies in a unified way. I read how you were using causal diagrams in your paper as rather describing various ways causal graph relationships may be broken by taking action rather than simply describing relationships between proxies and targets and ways they may be confused with each other (which is the function of the Goodhart’s law causal graphs above).
Mostly the purpose of this post and the next are to present an alternative, and I think cleaner, ontological structure for thinking about Goodhart’s law though there will still be some messiness in carving up reality.
As to your suggested mitigations, both randomization and secret metric are good to add though I’m not as sure about post hoc. Thanks for the suggestions and the surrounding paper.