TurnTrout comments on The ground of optimization

TurnTrout Jun 22, 2020, 11:43 PM
LW: 5 AF: 3
AF
Mild optimization: the easiest way to solve hard tasks may be to specify a proxy, which an AI maximizes. The AI steers into configurations which maximize the proxy function. Simple proxies don’t usually have target sets which we like, because human value is complex. However, maybe we just want the AI to randomly select a configuration which satisfies the proxy, instead of finding the maximally-proxy-ness configuration, which may be bad due to extremal Goodhart.

Quantilization tries to solve this by randomly selecting a target configuration from some top quantile, but this is sensitive to how world states are individuated.
What links here?
- Rohin Shah's comment on The ground of optimization by Alex Flint (Jun 23, 2020, 6:55 PM; 8 points)
- Rohin Shah Jun 23, 2020, 5:54 PM
  LW: 8 AF: 5
  AF Parent
  This makes sense, but I think you’d need a different notion of optimizing systems than the one used in this post. (In particular, instead of a target configuration set, you want a continuous notion of goodness, like a utility function / reward function.)
  - TurnTrout Jun 23, 2020, 6:15 PM
    LW: 2 AF: 1
    AF Parent
    I’m saying the target set for non-mild optimization is the set of configurations which maximize proxy-ness. Just take the argmax. By contrast, we might want to sample uniformly randomly from the set of satisficing configurations, which is much larger.
    (This is assuming a fixed initial state)
    - Rohin Shah Jun 23, 2020, 6:49 PM
      LW: 4 AF: 3
      AF Parent
      It sounds like you’re assuming that the target configuration set is built into the AI system. According to me, a major point of this post / framework is to avoid that assumption altogether, and only describe problems in terms of the actual observed system behavior.
      (This is why within this framework I couldn’t formalize outer alignment, and why wireheading and the search / mesa-objective split is unnatural.)
      - TurnTrout Jun 23, 2020, 7:48 PM
        LW: 4 AF: 3
        AF Parent
        I see the tension you’re pointing at. I think I had in mind something like “an AI is reliably optimizing utility function u over the configuration space (but not necessarily over universe-histories!) if it reliably moves into high-rated configurations”, and you could draw different epsilon-neighborhoods of optimality in configuration space. It seems like you should be able to talk about dog-maximizers without requiring that the agent robustly end up in the maximum-dog configurations (and not in max-minus-one-dog configs).
        I’m still confused about parts of this.