Gordon Seidoh Worley comments on nostalgebraist: Recursive Goodhart’s Law

Gordon Seidoh Worley 27 Aug 2020 21:49 UTC
2 points
I think once one begins to enter this alternative frame where lots of things aren’t optimization, it starts to become apparent that “hardly anything is just optimization”—IE, understanding something as optimization often hardly explains anything about it, and there are often other frames which would explain much more.
I guess it depends on whether you want to keep “optimization” as a referent to the general motion that is making the world more likely to be one way than another or a specific type of making the world more likely to be one way rather than another. I think the former is more of a natural category for the types of things most people seem to mean by optimizing.
None of this is to say, though, that there aren’t many processes where the optimization framing is not very useful. For example, you mention logic and Bayesian updating as examples, and that sounds right to me, because those are processes operating over the map rather than the territory (even if they are meant to be grounded in the territory), and when you only care about the map it doesn’t make much sense to talk about taking actions to make the world one way rather than another, because there is only one consistent way the world can be within the system of a particular map.
- abramdemski 28 Aug 2020 15:36 UTC
  2 points
  Parent
  I guess it depends on whether you want to keep “optimization” as a referent to the general motion that is making the world more likely to be one way than another or a specific type of making the world more likely to be one way rather than another.
  I suspect you’re trying to gesture at a slightly better definition here than the one you give, but since I’m currently in the business of arguing that we should be precise about what we mean by ‘optimization’… what do you mean here?
  Just about any element of the world will “make the world more likely to be one way rather than another”.
  - Gordon Seidoh Worley 29 Aug 2020 1:22 UTC
    2 points
    Parent
    Yeah, if I want to be precise, I mean anytime there is a feedback loop there is optimization.
    - abramdemski 30 Aug 2020 8:29 UTC
      4 points
      Parent
      That does seem better. But I don’t think it fills the shoes of the general notion of optimization people use.
      
      Either you mean negative feedback loops specifically, or you mean to include both negative and positive feedback loops. But both choices seem a little problematic to me.
      Negative: this seems to imply that all optimization is a kind of homeostasis. But it seems as if some optimization, at least, can be described as “more is better” (ie the more typical utility framing). It’s hard to see how to characterize that as a negative feedback loop.
      Both: this would include all positive feedback loops as well. But I think not all positive feedback loops have the optimization flavor. For example, there is a positive feedback loop between how reflective a water-rich planet’s surface is and how much snow forms. Snow makes the surface more reflective, which makes it absorb less heat, which makes it colder, which makes more snow form. Idk, maybe it’s fine for this to count as optimization? I’m not sure.
      Does guess-and-check search even count as a positive feedback loop? It’s definitely optimization in the broad sense, but finding one improvement doesn’t help you find another, because you’re just sampling each next point to check independently. So it doesn’t seem to fit that well into the feedback loop model.
      
      Feedback loops seem to require measurable (observable) targets. But broadly speaking, optimization can have non-measurable targets. An agent can steer the world based on a distant goal, such as aiming for a future with a flourishing galactic civilization.
      Now, it’s possible that the feedback loop model can recover from this one by pointing to a (positive) feedback loop within the agent’s head, where the agent is continually searching for better plans to implement. However, I believe this is not always going to be the case. The inside of an agent’s head can be a weird place. The “agent” could just be implementing a hardwired policy which indeed steers towards a distant goal, but which doesn’t do it through any simple feedback loop. (Of course there is still, broadly speaking, feedback between the agent and the environment. But the same could be said of a rock.) I think this counts as the broad notion of optimization, because it would count under Eliezer’s definition—the agent makes the world end up surprisingly high in the metric, even though there’s no feedback loop between it and the metric, nor an internal feedback loop involving a representation/expectation for the metric.
      - Gordon Seidoh Worley 30 Aug 2020 21:33 UTC
        2 points
        Parent
        I’m pretty happy to count all these things as optimization. Much of the issue I find with using the feedback loop definition is, as you point to, is the difficulty of figuring out things like “is there a lot here?”, suggesting there might be a better, more general model for what I’ve been pointing to work feedback loop because it’s simply the closest, most general model I know. Which actually points back to the way I phrased it before, which isn’t formalized but I think does come closer to expansively capturing all the things I think make sense to group together as “optimization”.