That seems to be a bit of a motte-and-bailey. Goal-directed behavior does not require optimizing, satisficing works fine. Having a utility function means not stopping until it’s maximized, as I understand it.
Eh. I genuinely don’t expect to build an AI that acts like a utility maximizer in all contexts. All real-world agents are limited—they can get hit by radiation, or dumped into supernovae, or fed adversarial noise, etc. All we’re ever going to see in the real world are things that can be intentional-stanced in broad but limited domains.
Satisficers have goal-directed behavior sometimes, but not in all contexts—the more satisfied they are, the less goal-directed they are. If I built a satisficer who would be satisfied with merely controlling the Milky Way (rather than the entire universe), that’s plenty dangerous. And coincidentally, it’s going to be acting goal-directed in all contexts present in your everyday life, because none of them come close to satisfying it.
There is still the murky area of following some proxy utility function within a sensible goodhart scope (a thing like base distribution of quantilization), even if you are not doing expected utility maximization and won’t be letting the siren song of the proxy lead you out of scope of the proxy. It just won’t be the utility function that selection theorems assign to you based on your coherent decisions, because you won’t be making coherent decisions according to the classical definitions if you are not doing expected utility maximization (which is unbounded optimization).
But then if you are not doing expected utility maximization, it’s not clear that things in the shape of utility functions are that useful in specifying decision problems. So a good proxy for an unknown utility function is not obviously itself a utility function.
That seems to be a bit of a motte-and-bailey. Goal-directed behavior does not require optimizing, satisficing works fine. Having a utility function means not stopping until it’s maximized, as I understand it.
Eh. I genuinely don’t expect to build an AI that acts like a utility maximizer in all contexts. All real-world agents are limited—they can get hit by radiation, or dumped into supernovae, or fed adversarial noise, etc. All we’re ever going to see in the real world are things that can be intentional-stanced in broad but limited domains.
Satisficers have goal-directed behavior sometimes, but not in all contexts—the more satisfied they are, the less goal-directed they are. If I built a satisficer who would be satisfied with merely controlling the Milky Way (rather than the entire universe), that’s plenty dangerous. And coincidentally, it’s going to be acting goal-directed in all contexts present in your everyday life, because none of them come close to satisfying it.
There is still the murky area of following some proxy utility function within a sensible goodhart scope (a thing like base distribution of quantilization), even if you are not doing expected utility maximization and won’t be letting the siren song of the proxy lead you out of scope of the proxy. It just won’t be the utility function that selection theorems assign to you based on your coherent decisions, because you won’t be making coherent decisions according to the classical definitions if you are not doing expected utility maximization (which is unbounded optimization).
But then if you are not doing expected utility maximization, it’s not clear that things in the shape of utility functions are that useful in specifying decision problems. So a good proxy for an unknown utility function is not obviously itself a utility function.