My guess: there’s a conflict between the mathematically desirable properties of an expected utility maximizer on the one hand, and the very undesirable behaviors of the AI safety culture’s most salient examples of expected utility maximizers on the other (e.g., a paperclip maximizer, a happiness maximizer, etc).
People associate the badness of these sorts of “simple utility functions” EU maximizers with the mathematical EU maximization framework. I think that “EU maximization for humans” looks like an optimal joint policy that reflects a negotiated equilibrium across our entire distribution over diverse values, not some sort of collapse into maximizing a narrow conception of what humans “really” want.
I think of “wrapper mind bad” as referring to the intuitive notion of a simple EU maximizer / paperclipper, which are very bad. Arguing that “EU maximization good” is, I think, true, but not quite getting at the intuition behind “wrapper mind bad”.
Given how every natural goal-seeking agent seems to be built on layers and layers of complex interactions, I have to wonder if “utility” and “goals” are wrong paradigms to use. Not that I have any better ones ready, mind.
The point is not that EU maximizers are always bad in principle, but that utility that won’t be bad is not something we can give to an AGI that acts as an EU maximizer, because it won’t be merely more complicated than the simple utilities from the obviously bad examples, it must be seriously computationally intractable, given by very indirect pointers to value. And optimizing according to an intractable definition of utility is no longer EU maximization (in practice, where compute matters), this framing stops being useful in that case.
It’s only useful for misaligned optimizers or in unbounded-compute theory that doesn’t straightforwardly translate to practice.
If you need to represent some computationally intractable object, there are many tricks available to approximate such an object in a computationally efficient manner. E.g., one can split the intractable object into modular factors, then use only those factors which are most relevant to the current situation. My guess is that this is exactly what values are: modular, tractable factors that let us efficiently approximate a computationally intractable utility function.
If you actually optimize according to an approximation, that’s going to goodhart curse the outcome. Any approximation must only be soft-optimized for, not EU maximized. A design that seeks EU maximization, and hopes for soft optimization that doesn’t go too far, doesn’t pass the omnipotence test.
Also, an approximation worth even soft-optimizing for should be found in a value-laden way, losing inessential details and not something highly value-relevant. Approximate knowledge of values helps with finding better approximations to values.
My guess: there’s a conflict between the mathematically desirable properties of an expected utility maximizer on the one hand, and the very undesirable behaviors of the AI safety culture’s most salient examples of expected utility maximizers on the other (e.g., a paperclip maximizer, a happiness maximizer, etc).
People associate the badness of these sorts of “simple utility functions” EU maximizers with the mathematical EU maximization framework. I think that “EU maximization for humans” looks like an optimal joint policy that reflects a negotiated equilibrium across our entire distribution over diverse values, not some sort of collapse into maximizing a narrow conception of what humans “really” want.
I think of “wrapper mind bad” as referring to the intuitive notion of a simple EU maximizer / paperclipper, which are very bad. Arguing that “EU maximization good” is, I think, true, but not quite getting at the intuition behind “wrapper mind bad”.
Given how every natural goal-seeking agent seems to be built on layers and layers of complex interactions, I have to wonder if “utility” and “goals” are wrong paradigms to use. Not that I have any better ones ready, mind.
The point is not that EU maximizers are always bad in principle, but that utility that won’t be bad is not something we can give to an AGI that acts as an EU maximizer, because it won’t be merely more complicated than the simple utilities from the obviously bad examples, it must be seriously computationally intractable, given by very indirect pointers to value. And optimizing according to an intractable definition of utility is no longer EU maximization (in practice, where compute matters), this framing stops being useful in that case.
It’s only useful for misaligned optimizers or in unbounded-compute theory that doesn’t straightforwardly translate to practice.
If you need to represent some computationally intractable object, there are many tricks available to approximate such an object in a computationally efficient manner. E.g., one can split the intractable object into modular factors, then use only those factors which are most relevant to the current situation. My guess is that this is exactly what values are: modular, tractable factors that let us efficiently approximate a computationally intractable utility function.
If you actually optimize according to an approximation, that’s going to goodhart curse the outcome. Any approximation must only be soft-optimized for, not EU maximized. A design that seeks EU maximization, and hopes for soft optimization that doesn’t go too far, doesn’t pass the omnipotence test.
Also, an approximation worth even soft-optimizing for should be found in a value-laden way, losing inessential details and not something highly value-relevant. Approximate knowledge of values helps with finding better approximations to values.