Vladimir_Nesov comments on All AGI Safety questions welcome (especially basic ones) [April 2023]

Vladimir_Nesov 8 Apr 2023 14:25 UTC
4 points
1
One problem with utility maximizers, apart from use of obviously wrong utility functions, is that even approximately correct utility functions lead to ruin by goodharting what they actually measure, moving the environment outside the scope of situations where the utility proxy remains approximately correct.

To oppose this, we need the system to be aware of the scope of situations its utility proxy adequately describes. One proposal for doing this is quantilization, where the scope of robustness is tracked by its base distribution.
- rotatingpaguro 8 Apr 2023 18:22 UTC
  3 points
  0
  Parent
  I agree with what you write but it does not answer the question. From the links you provide I arrived at Quantilizers maximize expected utility subject to a conservative cost constraint, which says that a quantilizer, which is a more accurate formalization of a satisficer as I defined it, maximizes utility subject to a constraint over the pessimization of all possible cost functions from the action generation mechanism to the action selection. This is relevant but does not translate the satisficer to a maximizer, unless it is possible to express that constraint in the utility function (maybe it’s possible, I don’t see how to do it).
  - Vladimir_Nesov 8 Apr 2023 19:20 UTC
    2 points
    0
    Parent
    Sure, it’s more of a reframing of the question in a direction where I’m aware of an interesting answer. Specifically, since you mentioned alignment problems, satisficers sound like something that should fight goodharting, and that might need awareness of scope of robustness, not just optimizing less forcefully.
    
    Looking at the question more closely, one problem is that the way you are talking about a satisficer, it might have a different type signature from EU maximizers. (Unlike expected utility maximizers, “satisficers” don’t have a standard definition.) EU maximizer can compare events (parts of the sample space) and choose one with higher expected utility, which is equivalent to coherent preference between such events. So an EU agent is not just taking actions in individual possible worlds that are points of the sample space (that the utility function evaluates on). Instead it’s taking actions in possible “decision situations” (which are not the same thing as possible worlds or events) that offer a choice between multiple events in the sample space, each event representing uncertainty about possible worlds, and with no opportunity to choose outcomes that are not on offer in this particular “decision situation”.
    
    But a satisficer, under a minimal definition, just picks a point of the space, instead of comparing given events (subspaces). For example, if given a choice among events that all have very high expected utility (higher than the satisficer’s threshold), what is the satisficer going to do? Perhaps it should choose the option with least expected utility, but that’s unclear (and likely doesn’t result in utility maximization for any utility function, or anything reasonable from the alignment point of view). So the problem seems underspecified.