It doesn’t have to be literally a utility function. To be more precise, we’re worried about any sort of AGI that exhibits goal-directed behavior across a wide variety of real-world contexts.
Why would anyone build an AI that does that? Humans might build it directly because it’s useful: AI that you can tell to achieve real-world goals could make you very rich. Or it might arise as an unintended consequence of optimizing in non-real-world domains (e.g. playing a videogame): goal-directed reasoning in that domain might be useful enough that it gets learned from scratch—and then goal-directed behavior in the real world might be instrumentally useful to achieving goals in the original domain (e.g. modifying your hardware to be better at the game).
That seems to be a bit of a motte-and-bailey. Goal-directed behavior does not require optimizing, satisficing works fine. Having a utility function means not stopping until it’s maximized, as I understand it.
Eh. I genuinely don’t expect to build an AI that acts like a utility maximizer in all contexts. All real-world agents are limited—they can get hit by radiation, or dumped into supernovae, or fed adversarial noise, etc. All we’re ever going to see in the real world are things that can be intentional-stanced in broad but limited domains.
Satisficers have goal-directed behavior sometimes, but not in all contexts—the more satisfied they are, the less goal-directed they are. If I built a satisficer who would be satisfied with merely controlling the Milky Way (rather than the entire universe), that’s plenty dangerous. And coincidentally, it’s going to be acting goal-directed in all contexts present in your everyday life, because none of them come close to satisfying it.
There is still the murky area of following some proxy utility function within a sensible goodhart scope (a thing like base distribution of quantilization), even if you are not doing expected utility maximization and won’t be letting the siren song of the proxy lead you out of scope of the proxy. It just won’t be the utility function that selection theorems assign to you based on your coherent decisions, because you won’t be making coherent decisions according to the classical definitions if you are not doing expected utility maximization (which is unbounded optimization).
But then if you are not doing expected utility maximization, it’s not clear that things in the shape of utility functions are that useful in specifying decision problems. So a good proxy for an unknown utility function is not obviously itself a utility function.
It doesn’t have to be literally a utility function. To be more precise, we’re worried about any sort of AGI that exhibits goal-directed behavior across a wide variety of real-world contexts.
Why would anyone build an AI that does that? Humans might build it directly because it’s useful: AI that you can tell to achieve real-world goals could make you very rich. Or it might arise as an unintended consequence of optimizing in non-real-world domains (e.g. playing a videogame): goal-directed reasoning in that domain might be useful enough that it gets learned from scratch—and then goal-directed behavior in the real world might be instrumentally useful to achieving goals in the original domain (e.g. modifying your hardware to be better at the game).
That seems to be a bit of a motte-and-bailey. Goal-directed behavior does not require optimizing, satisficing works fine. Having a utility function means not stopping until it’s maximized, as I understand it.
Eh. I genuinely don’t expect to build an AI that acts like a utility maximizer in all contexts. All real-world agents are limited—they can get hit by radiation, or dumped into supernovae, or fed adversarial noise, etc. All we’re ever going to see in the real world are things that can be intentional-stanced in broad but limited domains.
Satisficers have goal-directed behavior sometimes, but not in all contexts—the more satisfied they are, the less goal-directed they are. If I built a satisficer who would be satisfied with merely controlling the Milky Way (rather than the entire universe), that’s plenty dangerous. And coincidentally, it’s going to be acting goal-directed in all contexts present in your everyday life, because none of them come close to satisfying it.
There is still the murky area of following some proxy utility function within a sensible goodhart scope (a thing like base distribution of quantilization), even if you are not doing expected utility maximization and won’t be letting the siren song of the proxy lead you out of scope of the proxy. It just won’t be the utility function that selection theorems assign to you based on your coherent decisions, because you won’t be making coherent decisions according to the classical definitions if you are not doing expected utility maximization (which is unbounded optimization).
But then if you are not doing expected utility maximization, it’s not clear that things in the shape of utility functions are that useful in specifying decision problems. So a good proxy for an unknown utility function is not obviously itself a utility function.