My intuition says that a narrow AI like DALL-E would not blow up the world, no matter how much smarter it became. It would just get really good at making pictures.
This is clearly a form of superintelligence we would all prefer, and the difference seems to me to be that DALL-E doesn’t really seem to have ‘goals’ or anything like that, it’s just a massive tool.
Why do we care to have AGI with utility functions?
It doesn’t have to be literally a utility function. To be more precise, we’re worried about any sort of AGI that exhibits goal-directed behavior across a wide variety of real-world contexts.
Why would anyone build an AI that does that? Humans might build it directly because it’s useful: AI that you can tell to achieve real-world goals could make you very rich. Or it might arise as an unintended consequence of optimizing in non-real-world domains (e.g. playing a videogame): goal-directed reasoning in that domain might be useful enough that it gets learned from scratch—and then goal-directed behavior in the real world might be instrumentally useful to achieving goals in the original domain (e.g. modifying your hardware to be better at the game).
That seems to be a bit of a motte-and-bailey. Goal-directed behavior does not require optimizing, satisficing works fine. Having a utility function means not stopping until it’s maximized, as I understand it.
Eh. I genuinely don’t expect to build an AI that acts like a utility maximizer in all contexts. All real-world agents are limited—they can get hit by radiation, or dumped into supernovae, or fed adversarial noise, etc. All we’re ever going to see in the real world are things that can be intentional-stanced in broad but limited domains.
Satisficers have goal-directed behavior sometimes, but not in all contexts—the more satisfied they are, the less goal-directed they are. If I built a satisficer who would be satisfied with merely controlling the Milky Way (rather than the entire universe), that’s plenty dangerous. And coincidentally, it’s going to be acting goal-directed in all contexts present in your everyday life, because none of them come close to satisfying it.
There is still the murky area of following some proxy utility function within a sensible goodhart scope (a thing like base distribution of quantilization), even if you are not doing expected utility maximization and won’t be letting the siren song of the proxy lead you out of scope of the proxy. It just won’t be the utility function that selection theorems assign to you based on your coherent decisions, because you won’t be making coherent decisions according to the classical definitions if you are not doing expected utility maximization (which is unbounded optimization).
But then if you are not doing expected utility maximization, it’s not clear that things in the shape of utility functions are that useful in specifying decision problems. So a good proxy for an unknown utility function is not obviously itself a utility function.
Good answer from Gwern here.
Thank you, this answered my question
“Agent” of course means more than one thing, eg;
Active versus passive...basically acting unprompted if we are talking about software.
Acting on another’s behalf, as in principal-agent.
Having a utility function of its own (or some sort of goals of its own) and optimising (or satisficing) it.
Something that depends on free will, consciousness, Selfhood, etc.
Gwern’s claim that it’s advantageous for agents to be tools is clearly false in sense 1. Most of the instances of software in the world are passive.. Spreadsheets, word processors and so on, that sit there doing nothing when you fire them up. The market doesn’t demand agentive1 versions of spreadsheets nd word processors, and they haven’t been outcompeted by agentive versions. They are tools that want to remain tools.
There are software agents in senses 1 and 2,such as automated trading software. Trading software is agentive in the principle-agent sense, ie. It’s intended to make money for its creators, the Principal. They don’t want it to have too much agency, because it might start losing them money, or breaking the law, or making money for someone else...its creators don’t want it to have a will of its own, they want it to optimise their own utility function.
So that’s another sense in which “the more agency, the better”, is false. (Incidentally, it also means that Control and Capability aren’t orthogonal …capability of a kind worth wanting needs to be somewhat controlled, and the easiest way to control is to keep capability minimal).
Optimisation is definable for an agent that does not have its own UF,...it’s optimising the principal’s UF as well as it can,and as well as the principal/creator can communicate it. That’s not scary...if it’s optimising your UF , it’s aligned with you, and if it isn’t that’s an ordinary problem of expressing your business goals in an algorithm. But achieving that is your problem..a type 2 agent does not have its own UF, so you are communicating your UF to it by writing an algorithm, or something like that.
Giving an agent its own UF does not necessarily make it better at optimising your UF...which is to say, does not make it more optimised in any sense you care about. So there is no strong motivation towards.it.
An agent with its own UF is more sophisticated and powerful in some senses, but there are multiple definitions of “power” and multiple interests here...it’s not all one ladder that everyone is trying to climb as fast as possible.
Beware the slippery slope from “does something” to “is an agent” to “has A UF” to “is an optimiser” to “is a threat”, to “is an existential threat”!
A more intelligent DALL-E wouldn’t make pictures that people like better, it would more accurately approximate the distribution of images in its training data. And you’re right that this is not dangerous, but it is also not very useful.
A utility function is an abstraction. It is not something that you literally program into an agent. A utility function is a dual to all the individual decisions made or the preferences between real or hypothetical options. A utility function always implicitly exists if the preferences satisfy certain reasonable requirements. But it is mostly not possible to determine the utility function from observed preferences because you’d need all preferences or make a lot of regularizing assumptions.
A utility function can be a real, separable feature of a system, but that is rather exceptional.
Goodness of the picture is the utility function.
Maybe the intuition can be pumped by thinking of a picture prompt like “Timelapse of the world getting fixed. Colorized historical photo 4k.”