Quintin Pope comments on wrapper-minds are the enemy

Quintin Pope 17 Jun 2022 23:05 UTC
15 points
My guess: there’s a conflict between the mathematically desirable properties of an expected utility maximizer on the one hand, and the very undesirable behaviors of the AI safety culture’s most salient examples of expected utility maximizers on the other (e.g., a paperclip maximizer, a happiness maximizer, etc).
People associate the badness of these sorts of “simple utility functions” EU maximizers with the mathematical EU maximization framework. I think that “EU maximization for humans” looks like an optimal joint policy that reflects a negotiated equilibrium across our entire distribution over diverse values, not some sort of collapse into maximizing a narrow conception of what humans “really” want.
I think of “wrapper mind bad” as referring to the intuitive notion of a simple EU maximizer / paperclipper, which are very bad. Arguing that “EU maximization good” is, I think, true, but not quite getting at the intuition behind “wrapper mind bad”.
- Aleksi Liimatainen 18 Jun 2022 14:17 UTC
  10 points
  Parent
  Given how every natural goal-seeking agent seems to be built on layers and layers of complex interactions, I have to wonder if “utility” and “goals” are wrong paradigms to use. Not that I have any better ones ready, mind.
- Vladimir_Nesov 15 Jul 2022 15:43 UTC
  −6 points
  Parent
  The point is not that EU maximizers are always bad in principle, but that utility that won’t be bad is not something we can give to an AGI that acts as an EU maximizer, because it won’t be merely more complicated than the simple utilities from the obviously bad examples, it must be seriously computationally intractable, given by very indirect pointers to value. And optimizing according to an intractable definition of utility is no longer EU maximization (in practice, where compute matters), this framing stops being useful in that case.
  
  It’s only useful for misaligned optimizers or in unbounded-compute theory that doesn’t straightforwardly translate to practice.
  - Quintin Pope 15 Jul 2022 15:55 UTC
    4 points
    Parent
    it must be seriously computationally intractable
    If you need to represent some computationally intractable object, there are many tricks available to approximate such an object in a computationally efficient manner. E.g., one can split the intractable object into modular factors, then use only those factors which are most relevant to the current situation. My guess is that this is exactly what values are: modular, tractable factors that let us efficiently approximate a computationally intractable utility function.
    - Vladimir_Nesov 15 Jul 2022 16:00 UTC
      −5 points
      Parent
      If you actually optimize according to an approximation, that’s going to goodhart curse the outcome. Any approximation must only be soft-optimized for, not EU maximized. A design that seeks EU maximization, and hopes for soft optimization that doesn’t go too far, doesn’t pass the omnipotence test.
      
      Also, an approximation worth even soft-optimizing for should be found in a value-laden way, losing inessential details and not something highly value-relevant. Approximate knowledge of values helps with finding better approximations to values.