I think that your discussion of Goodhart deception is a bit confusing, since consequentialist deception is a type of Goodharting, it’s just adversarial Goodhart rather than regressional/causal/extremal Goodhart.
I think that your discussion of Goodhart deception is a bit confusing, since consequentialist deception is a type of Goodharting, it’s just adversarial Goodhart rather than regressional/causal/extremal Goodhart.