rotatingpaguro comments on All AGI Safety questions welcome (especially basic ones) [April 2023]

rotatingpaguro 8 Apr 2023 13:57 UTC
4 points
0
Is there a trick to write a utility satisficer as a utility maximizer?

By “utility maximizer” I mean the ideal bayesian agent from decision theory that outputs those actions which maximize some expected utility $E [U (x)]$ over states of the world $x$ .

By “utility satisficer” I mean an agent that searches for actions that make $E [U (x)]$ greater than some threshold short of the ideally attainable maximum, and contents itself with the first such action found. For reference, let’s fix that $0 < U < 1$ and set the satisficer threshold to $1 / 2$ .

The satisficer is not something that maximizes $E [min (U (x), 1 / 2)]$ . That would be again a utility maximizer, but with utility $min (U (x), 1 / 2)$ , and it would run into the usual alignment problems. The satisficer reasons on $min (E [U (x)], 1 / 2)$ . However, I’m curious if there is still a way to start from a satisficer with utility $U_{s} (x)$ and threshold $t$ and define a maximizer with utility $U_{m} (x)$ that is functionally equivalent to the satisficer.

As said, it looks clear to me that $U_{m} (x) = min (U_{s}, t)$ won’t work.

Of course it is possible to write $U_{m} (x) = {\begin{matrix} 1 & x is a world where the actions of the agent match those of the given satisficer 0 & otherwise, \end{matrix}$ but it is not interesting. It is not a compact utility to encode.

Is there some useful equivalence of intermediate complexity and generality? If there was, I expect it would make me think alignment is more difficult.
- Vladimir_Nesov 8 Apr 2023 14:25 UTC
  4 points
  1
  Parent
  One problem with utility maximizers, apart from use of obviously wrong utility functions, is that even approximately correct utility functions lead to ruin by goodharting what they actually measure, moving the environment outside the scope of situations where the utility proxy remains approximately correct.
  
  To oppose this, we need the system to be aware of the scope of situations its utility proxy adequately describes. One proposal for doing this is quantilization, where the scope of robustness is tracked by its base distribution.
  - rotatingpaguro 8 Apr 2023 18:22 UTC
    3 points
    0
    Parent
    I agree with what you write but it does not answer the question. From the links you provide I arrived at Quantilizers maximize expected utility subject to a conservative cost constraint, which says that a quantilizer, which is a more accurate formalization of a satisficer as I defined it, maximizes utility subject to a constraint over the pessimization of all possible cost functions from the action generation mechanism to the action selection. This is relevant but does not translate the satisficer to a maximizer, unless it is possible to express that constraint in the utility function (maybe it’s possible, I don’t see how to do it).
    - Vladimir_Nesov 8 Apr 2023 19:20 UTC
      2 points
      0
      Parent
      Sure, it’s more of a reframing of the question in a direction where I’m aware of an interesting answer. Specifically, since you mentioned alignment problems, satisficers sound like something that should fight goodharting, and that might need awareness of scope of robustness, not just optimizing less forcefully.
      
      Looking at the question more closely, one problem is that the way you are talking about a satisficer, it might have a different type signature from EU maximizers. (Unlike expected utility maximizers, “satisficers” don’t have a standard definition.) EU maximizer can compare events (parts of the sample space) and choose one with higher expected utility, which is equivalent to coherent preference between such events. So an EU agent is not just taking actions in individual possible worlds that are points of the sample space (that the utility function evaluates on). Instead it’s taking actions in possible “decision situations” (which are not the same thing as possible worlds or events) that offer a choice between multiple events in the sample space, each event representing uncertainty about possible worlds, and with no opportunity to choose outcomes that are not on offer in this particular “decision situation”.
      
      But a satisficer, under a minimal definition, just picks a point of the space, instead of comparing given events (subspaces). For example, if given a choice among events that all have very high expected utility (higher than the satisficer’s threshold), what is the satisficer going to do? Perhaps it should choose the option with least expected utility, but that’s unclear (and likely doesn’t result in utility maximization for any utility function, or anything reasonable from the alignment point of view). So the problem seems underspecified.