Drake Thomas comments on When is Goodhart catastrophic?

Drake Thomas 11 May 2023 0:12 UTC
LW: 2 AF: 2
0
AF
An example of the sort of strengthening I wouldn’t be surprised to see is something like “If $V$ is not too badly behaved in the following ways, and for all $v \in R$ we have [some light-tailedness condition] on the conditional distribution $(X | V = v)$ , then catastrophic Goodhart doesn’t happen.” This seems relaxed enough that you could actually encounter it in practice.
What links here?
- Thomas Kwa's comment on When is Goodhart catastrophic? by Drake Thomas (11 May 2023 1:22 UTC; 4 points)
- Thomas Kwa 15 Nov 2023 21:12 UTC
  LW: 3 AF: 2
  0
  AF Parent
  Suppose that we are selecting for $U = X + V$ where V is true utility and X is error. If our estimator is unbiased ( $E [X | V = v] = 0$ for all v) and X is light-tailed conditional on any value of V, do we have ${lim}_{t \to \infty} E [V | X + V \geq t] = \infty$ ?
  No; here is a counterexample. Suppose that $V \sim N (0, 1)$ , and $X | V \sim N (0, 4)$ when $V \in [- 1, 1]$ , otherwise $X = 0$ . Then I think ${lim}_{t \to \infty} E [V | X + V \geq t] = 0$ .
  This is worrying because in the case where $V \sim N (0, 1)$ and $X \sim N (0, 4)$ independently, we do get infinite V. Merely making the error *smaller* for large values of V causes catastrophe. This suggests that success caused by light-tailed error when V has even lighter tails than X is fragile, and that these successes are “for the wrong reason”: they require a commensurate overestimate of the value when V is high as when V is low.
  What links here?
  - Thomas Kwa’s research journal by Thomas Kwa (23 Nov 2023 5:11 UTC; 79 points)
  - Thomas Kwa's comment on Catastrophic Goodhart in RL with KL penalty by Thomas Kwa (17 May 2024 22:00 UTC; 2 points)