Vika comments on Classifying specification problems as variants of Goodhart’s Law

Vika 9 Jan 2021 18:25 UTC
LW: 6 AF: 3
0
AF
Writing this post helped clarify my understanding of the concepts in both taxonomies—the different levels of specification and types of Goodhart effects. The parts of the taxonomies that I was not sure how to match up usually corresponded to the concepts I was most confused about. For example, I initially thought that adversarial Goodhart is an emergent specification problem, but upon further reflection this didn’t seem right. Looking back, I think I still endorse the mapping described in this post.
I hoped to get more comments on this post proposing other ways to match up these concepts, and I think the post would have more impact if there was more discussion of its claims. The low level of engagement with this post was an update for me that the exercise of connecting different maps of safety problems is less valuable than I thought.