abramdemski comments on nostalgebraist: Recursive Goodhart’s Law

abramdemski 30 Aug 2020 8:29 UTC
4 points
That does seem better. But I don’t think it fills the shoes of the general notion of optimization people use.
- Either you mean negative feedback loops specifically, or you mean to include both negative and positive feedback loops. But both choices seem a little problematic to me.
  - Negative: this seems to imply that all optimization is a kind of homeostasis. But it seems as if some optimization, at least, can be described as “more is better” (ie the more typical utility framing). It’s hard to see how to characterize that as a negative feedback loop.
  - Both: this would include all positive feedback loops as well. But I think not all positive feedback loops have the optimization flavor. For example, there is a positive feedback loop between how reflective a water-rich planet’s surface is and how much snow forms. Snow makes the surface more reflective, which makes it absorb less heat, which makes it colder, which makes more snow form. Idk, maybe it’s fine for this to count as optimization? I’m not sure.
    Does guess-and-check search even count as a positive feedback loop? It’s definitely optimization in the broad sense, but finding one improvement doesn’t help you find another, because you’re just sampling each next point to check independently. So it doesn’t seem to fit that well into the feedback loop model.
- Feedback loops seem to require measurable (observable) targets. But broadly speaking, optimization can have non-measurable targets. An agent can steer the world based on a distant goal, such as aiming for a future with a flourishing galactic civilization.
  - Now, it’s possible that the feedback loop model can recover from this one by pointing to a (positive) feedback loop within the agent’s head, where the agent is continually searching for better plans to implement. However, I believe this is not always going to be the case. The inside of an agent’s head can be a weird place. The “agent” could just be implementing a hardwired policy which indeed steers towards a distant goal, but which doesn’t do it through any simple feedback loop. (Of course there is still, broadly speaking, feedback between the agent and the environment. But the same could be said of a rock.) I think this counts as the broad notion of optimization, because it would count under Eliezer’s definition—the agent makes the world end up surprisingly high in the metric, even though there’s no feedback loop between it and the metric, nor an internal feedback loop involving a representation/expectation for the metric.
- Gordon Seidoh Worley 30 Aug 2020 21:33 UTC
  2 points
  Parent
  I’m pretty happy to count all these things as optimization. Much of the issue I find with using the feedback loop definition is, as you point to, is the difficulty of figuring out things like “is there a lot here?”, suggesting there might be a better, more general model for what I’ve been pointing to work feedback loop because it’s simply the closest, most general model I know. Which actually points back to the way I phrased it before, which isn’t formalized but I think does come closer to expansively capturing all the things I think make sense to group together as “optimization”.