Anyway, it doesn’t even seem mathematically obvious to me that optimizing for G* will reduce correlation between G and G*.
See Greg Lewis’s post here: https://www.lesswrong.com/posts/dC7mP5nSwvpL65Qu5/why-the-tails-come-apart and Scott Alexander’s discussion here: http://slatestarcodex.com/2018/09/25/the-tails-coming-apart-as-metaphor-for-life/
Also see our paper formalizing the other Goodhart’s Law failure modes: https://arxiv.org/abs/1803.04585
See Greg Lewis’s post here: https://www.lesswrong.com/posts/dC7mP5nSwvpL65Qu5/why-the-tails-come-apart and Scott Alexander’s discussion here: http://slatestarcodex.com/2018/09/25/the-tails-coming-apart-as-metaphor-for-life/
Also see our paper formalizing the other Goodhart’s Law failure modes: https://arxiv.org/abs/1803.04585