abramdemski comments on nostalgebraist: Recursive Goodhart’s Law

abramdemski 26 Aug 2020 18:03 UTC
12 points
I really like this.
And I want to advise this person to sort of “chill out” in a way closely analogous to the way the hypothetical AI should “chill out” – a little more formally, I want to advise them to use a strategy that works better under model mis-specification, a strategy that doesn’t try to “jump to the end” and do the best thing without a continuous traversal of the intervening territory.
I like the analogy of “traversal of the intervening territory”, even though, like the author, I don’t know what it formally means.
Unlike the author, I do have some models of what it means to lack a utility function, but optimize anyway. Within such a model, I would say: it’s perfectly fine and possible to come up with a function which approximately represents your preferences, but holding on to such a function even after your preferences have updated away from it leads to a Goodhart-like risk.
More generally, it’s not the case that literally everything is well-understood as optimization of some function. There are lots of broadly intelligent processes that aren’t best-understood as optimization.
- Dagon 26 Aug 2020 21:50 UTC
  4 points
  Parent
  There are lots of broadly intelligent processes that aren’t best-understood as optimization.
  I’d love to hear some examples, and start a catalog of “useful understandings of intelligent processes”. I believe that control/feedback mechanisms, evolution, the decision theories tossed about here (CDT, TDT, UDT, etc.), and VNM-compliant agents generally are all optimizers, though not all with the same complexity or capabilities.
  Humans aren’t VNM-rational agents over time, but I believe each instantaneous decision is an optimization calculation within the brain.
  - abramdemski 27 Aug 2020 16:08 UTC
    20 points
    Parent
    Logic isn’t well-understood as optimization.
    Bayesian updates can sort of be thought of as optimization, but we can also notice the disanalogies. Bayesian updates aren’t really about maximizing something, but rather, about proportional response.
    Radical Probabilism is even less about optimizing something.
    So, taking those three together, it’s worth considering the claim that epistemics isn’t about optimization.
    This calls into question how much of machine learning, which is apparently about optimization, should really be taken as optimization or merely optimization-like.
    The goal is never really to maximize some function, but rather to “generalize well”, a task for which we can only ever give proxies.
    This fact influences the algorithms. Most overtly, regularization techniques are used to soften the optimization. Some of these are penalties added to the loss function, which keeps us squarely in the realm of optimizing some function. However, other techniques, such as dropout, do not take this form. Furthermore, we can think of the overall selection applied to ML techniques as selecting for generalization ability (not for ability to optimize some objective).
    Decision theories such as you name needn’t be formulated as optimizing some function in general, although that is the standard way to formulate them. Granted, they’re still going to argmax a quantity to decide on an action at the end of the day, but is that really enough to call the whole thing “optimization”? There’s a whole lot going on that’s not just optimization.
    There are a lot of ways in which evolution doesn’t fit the “optimization” frame, either. Many of those ways can be thought of merely as evolution being a poor optimizer. But I think a lot of it is also that evolution has the “epistemic” aspect—the remarkable thing about evolution isn’t how well it achieves some measurable objective, but rather its “generalization ability” (like Bayes/ML/etc). So maybe the optimization frame for evolution is somewhat good, but an epistemic/learning frame is also somewhat good.
    I think once one begins to enter this alternative frame where lots of things aren’t optimization, it starts to become apparent that “hardly anything is just optimization”—IE, understanding something as optimization often hardly explains anything about it, and there are often other frames which would explain much more.
    - Gordon Seidoh Worley 27 Aug 2020 21:49 UTC
      2 points
      Parent
      I think once one begins to enter this alternative frame where lots of things aren’t optimization, it starts to become apparent that “hardly anything is just optimization”—IE, understanding something as optimization often hardly explains anything about it, and there are often other frames which would explain much more.
      I guess it depends on whether you want to keep “optimization” as a referent to the general motion that is making the world more likely to be one way than another or a specific type of making the world more likely to be one way rather than another. I think the former is more of a natural category for the types of things most people seem to mean by optimizing.
      None of this is to say, though, that there aren’t many processes where the optimization framing is not very useful. For example, you mention logic and Bayesian updating as examples, and that sounds right to me, because those are processes operating over the map rather than the territory (even if they are meant to be grounded in the territory), and when you only care about the map it doesn’t make much sense to talk about taking actions to make the world one way rather than another, because there is only one consistent way the world can be within the system of a particular map.
      - abramdemski 28 Aug 2020 15:36 UTC
        2 points
        Parent
        I guess it depends on whether you want to keep “optimization” as a referent to the general motion that is making the world more likely to be one way than another or a specific type of making the world more likely to be one way rather than another.
        I suspect you’re trying to gesture at a slightly better definition here than the one you give, but since I’m currently in the business of arguing that we should be precise about what we mean by ‘optimization’… what do you mean here?
        Just about any element of the world will “make the world more likely to be one way rather than another”.
        Gordon Seidoh Worley 29 Aug 2020 1:22 UTC
        2 points
        Parent
        Yeah, if I want to be precise, I mean anytime there is a feedback loop there is optimization.
        abramdemski 30 Aug 2020 8:29 UTC
        4 points
        Parent
        That does seem better. But I don’t think it fills the shoes of the general notion of optimization people use.
        
        Either you mean negative feedback loops specifically, or you mean to include both negative and positive feedback loops. But both choices seem a little problematic to me.
        Negative: this seems to imply that all optimization is a kind of homeostasis. But it seems as if some optimization, at least, can be described as “more is better” (ie the more typical utility framing). It’s hard to see how to characterize that as a negative feedback loop.
        Both: this would include all positive feedback loops as well. But I think not all positive feedback loops have the optimization flavor. For example, there is a positive feedback loop between how reflective a water-rich planet’s surface is and how much snow forms. Snow makes the surface more reflective, which makes it absorb less heat, which makes it colder, which makes more snow form. Idk, maybe it’s fine for this to count as optimization? I’m not sure.
        Does guess-and-check search even count as a positive feedback loop? It’s definitely optimization in the broad sense, but finding one improvement doesn’t help you find another, because you’re just sampling each next point to check independently. So it doesn’t seem to fit that well into the feedback loop model.
        
        Feedback loops seem to require measurable (observable) targets. But broadly speaking, optimization can have non-measurable targets. An agent can steer the world based on a distant goal, such as aiming for a future with a flourishing galactic civilization.
        Now, it’s possible that the feedback loop model can recover from this one by pointing to a (positive) feedback loop within the agent’s head, where the agent is continually searching for better plans to implement. However, I believe this is not always going to be the case. The inside of an agent’s head can be a weird place. The “agent” could just be implementing a hardwired policy which indeed steers towards a distant goal, but which doesn’t do it through any simple feedback loop. (Of course there is still, broadly speaking, feedback between the agent and the environment. But the same could be said of a rock.) I think this counts as the broad notion of optimization, because it would count under Eliezer’s definition—the agent makes the world end up surprisingly high in the metric, even though there’s no feedback loop between it and the metric, nor an internal feedback loop involving a representation/expectation for the metric.
        
        Gordon Seidoh Worley 30 Aug 2020 21:33 UTC
        2 points
        Parent
        I’m pretty happy to count all these things as optimization. Much of the issue I find with using the feedback loop definition is, as you point to, is the difficulty of figuring out things like “is there a lot here?”, suggesting there might be a better, more general model for what I’ve been pointing to work feedback loop because it’s simply the closest, most general model I know. Which actually points back to the way I phrased it before, which isn’t formalized but I think does come closer to expansively capturing all the things I think make sense to group together as “optimization”.
    - Dagon 27 Aug 2020 17:32 UTC
      2 points
      Parent
      Awesome—I think I agree with most of this. Specifically, https://www.lesswrong.com/posts/A8iGaZ3uHNNGgJeaD/an-orthodox-case-against-utility-functions is very compatible with the possibility that the function is too complex for any agent to actually compute. It’s quite likely that there are more potential worlds than an agent can rank, with more features than an agent can measure.
      Any feasible comparison of potential worlds is actually a comparison of predicted summaries of those worlds. Both the prediction and the summary are lossy, thus that asepect of Goodhart’s law.
      I did not mean to say that “everything is an optimization process”. I did mean to say that decisions are an optimization process, and I now realize even that’s too strong. I suspect all I can actually assert is that “intentionality is an optimization process”.
      - abramdemski 27 Aug 2020 18:20 UTC
        4 points
        Parent
        I did not mean to say that “everything is an optimization process”. I did mean to say that decisions are an optimization process, and I now realize even that’s too strong. I suspect all I can actually assert is that “intentionality is an optimization process”.
        Oh, I didn’t mean to accuse you of that. It’s more that this is a common implicit frame of reference (including/especially on LW).
        I rather suspect the correct direction is to break down “optimization” into more careful concepts (starting, but not finishing, with something like selection vs control).
    - Dagon 27 Aug 2020 17:12 UTC
      2 points
      Parent
      epistemics isn’t about optimization.
      On this, we’re fully agreed. Epistemics may be pre-optimization, or may not.