Strongly agree—and Goodhart’s law is at least 4 things. Though I’d note that anti-inductive behavior / metric gaming is hard to separate from goal mis-specification, for exactly the reasons outlined in the post.
But saying there is a goal too complex to be understandable and legible implies that it’s really complex, but coherent. I don’t think that’s the case of individuals, and I’m certain it isn’t true of groups. (Arrow’s theorem, etc.)
But saying there is a goal too complex to be understandable and legible implies that it’s really complex, but coherent
I’m not sure it’s possible to distinguish between chaotically-complex and incoherent. Once you add reference class problems in (you can’t step in the same river twice; no two decisions are exactly identical), there’s no difference between “inconsistent” and “unknown terms with large exponents on unmeasured variables”.
But in any case, even without coherence/consistency across agents or over time, any given decision can be an optimization of something.
[ I should probably add an epistemic status: not sure this is a useful model, but I do suspect there are areas it maps to the territory well. ]
I don’t think the model is useful, since it’s non-predictive. And we have good reasons to think that human brains are actually incoherent. Which means I’m skeptical that there is something useful to find by fitting a complex model to find a coherent fit for an incoherent system.
I think (1) Dagon is right that if we consider a purely behavioral perspective the distinction gets meaningless at the boundaries, trying to distinguish between highly complex values vs incoherence; any set of actions can be justified via some values; (2) humans are incoherent, in the sense that there are strong candidate partial specifications of our values (most of us like food and sex) and we’re not always the most sensible in how we go about achieving them; (3) also, to the extent that humans can be said to have values, they’re highly complex.
The thing that makes these three statements consistent is that we use more than just a behavioral lense to judge “human values”.
Strongly agree—and Goodhart’s law is at least 4 things. Though I’d note that anti-inductive behavior / metric gaming is hard to separate from goal mis-specification, for exactly the reasons outlined in the post.
But saying there is a goal too complex to be understandable and legible implies that it’s really complex, but coherent. I don’t think that’s the case of individuals, and I’m certain it isn’t true of groups. (Arrow’s theorem, etc.)
I’m not sure it’s possible to distinguish between chaotically-complex and incoherent. Once you add reference class problems in (you can’t step in the same river twice; no two decisions are exactly identical), there’s no difference between “inconsistent” and “unknown terms with large exponents on unmeasured variables”.
But in any case, even without coherence/consistency across agents or over time, any given decision can be an optimization of something.
[ I should probably add an epistemic status: not sure this is a useful model, but I do suspect there are areas it maps to the territory well. ]
I’d agree with the epistemic warning ;)
I don’t think the model is useful, since it’s non-predictive. And we have good reasons to think that human brains are actually incoherent. Which means I’m skeptical that there is something useful to find by fitting a complex model to find a coherent fit for an incoherent system.
I think (1) Dagon is right that if we consider a purely behavioral perspective the distinction gets meaningless at the boundaries, trying to distinguish between highly complex values vs incoherence; any set of actions can be justified via some values; (2) humans are incoherent, in the sense that there are strong candidate partial specifications of our values (most of us like food and sex) and we’re not always the most sensible in how we go about achieving them; (3) also, to the extent that humans can be said to have values, they’re highly complex.
The thing that makes these three statements consistent is that we use more than just a behavioral lense to judge “human values”.