Extremal Goodhart is different from other forms of Goodhart in that a maximal value of something will always lead to less or zero true reward.
This is difficult to show, since you need to show that anything you maximize implies non maximal true reward. And that’s different from causal Goodhart, where the causal relationship is mistaken.
Extremal Goodhart is different from other forms of Goodhart in that a maximal value of something will always lead to less or zero true reward.
This is difficult to show, since you need to show that anything you maximize implies non maximal true reward. And that’s different from causal Goodhart, where the causal relationship is mistaken.