If a tool AI is programmed with a strong utility function to get accurate answers, is there a risk of it behaving like a UFAI to get more resources in order to improve its answers?
There’s two uses of ‘utility function’. One is analogous to Daniel Dennett’s “intentional stance” in that you can choose to interpret an entity as having a utility function—this is always possible but not necessarily a perspicuous way of understanding an entity—because you might end up with utility functions like “enjoys running in circles but is equally happy being prevented from running in circles”.
The second form is as an explicit component within an AI design. Tool-AIs do not contain such a component—they might have a relevance or accuracy function for evaluating answers, but it’s not a utility function over the world.
because you might end up with utility functions like “enjoys running in circles but is equally happy being prevented from running in circles”.
Is that a problem so long as some behaviors are preferred over others? You could have “is neutral about running in circles, but resists jumping up and down and prefers making abstract paintings”.
Tool-AIs do not contain such a component—they might have a relevance or accuracy function for evaluating answers, but it’s not a utility function over the world.
Wouldn’t that depend on the Tool-AI? Eliezer’s default no-akrasia AI does everything it can to fulfill its utility function. You presumably want it to be as accurate as possible or perhaps as accurate as useful. Would it be a problem for it to ask for more resources? To earn money on its own initiative for more resources? To lobby to get laws passed to give it more resources? At some point, it’s a problem if it’s going to try to rule the world to get more resources.....
Tool-AIs do not contain such a component—they might have a relevance or accuracy function for evaluating answers, but it’s not a utility function over the world.
Wouldn’t that depend on the Tool-AI?
I think this is explicitly part of the “Tool-AI” definition, that it is not a Utility Maximizer.
I think there’s thorough confusion between utilityA: utility as used in economics to try and predict humans (and predict them inaccurately), and the utilityB: utility as in the model based agent, where the utility is a mathematical function which takes in description of the world and which only refers to real world items if you read stuff into it that is not there and can not be put there.
Viciously maximizing some utilityB leads to, given sufficient capability, the vicious and ohh so dangerous modification of the inputs to utilityB function, i.e. wireheading.
The AIs as we know them, agents or tools, are not utilityA maximizers. We do not know how to make utilityA maximizer. The human intelligence also doesn’t seem to work as utilityA maximizer. It is likely the case that utilityA maximizer is a logical impossibility for agents embedded in the world, or at very least, requires very major advances in formalization of philosophy.
It is likely the case that utilityA maximizer is a logical impossibility for agents embedded in the world...
Very interesting and relevant! Can you elaborate or link? I think the case can be made based on Arrow’s theorem and its corollaries, but I’m not sure that’s what you have in mind.
What the hell does SIAI mean by ‘utility function’ anyway? (math please)
Inside the agents and tools as currently implemented, there is a solver that works on a function, and finds input values to that function, which result in maximum (or, usually, minimum) of that function (note that the output may be binary).
[To clarify: that function can include both model of the world and the evaluation of ‘desirability’ of properties of a state of this model. Usually, in software development, if you have f(g(x)) (where g is world predictor and f is the desirability evaluator), and g’s output is only ever used by f, this is a target for optimization to create fg(x) which is more accurate in given time but does not consist of nearly separable parts. Furthermore, the f output is only ever fed to comparison operators, making it another optimization target to create cmp_fg() which compares the actions directly perhaps by calculating the difference between worlds that is caused by particular action, which allows to cull most of processing out]
It, however, is entirely indifferent to actually maximizing anything. It doesn’t even try to maximize some internal variable (it will gladly try inputs that result in small output value, but usually is written not to report those inputs).
I think the confusion arises from defining the agent in English language-based concepts, as opposed to the AI developer’s behaviour where they would define things in some logical down-to-elements way, and then try to communicate it using English. The command in English, ‘bring me the best answer!‘, does tell you to go ahead and convert universe to computronium to answer it (if you are to interpret it in science-fiction-robot-minded way). The commands in programming languages, not really. I don’t think English specifies that either, we just can interpret it charitably enough if we feel like (starting from other purpose, such as ‘be nice’).
edit: I feel that a lot of difficulties of making ‘safe AGI’, those that are not outright nonsensical, are just repackaged special cases of statements about general difficulty of making any AGI, safe or not. That’s very nasty thing to do, to generate such special cases preferentially. edit: Also, some may be special cases of lack/impossibility of solution to symbol grounding.
If a tool AI is programmed with a strong utility function to get accurate answers, is there a risk of it behaving like a UFAI to get more resources in order to improve its answers?
There’s two uses of ‘utility function’. One is analogous to Daniel Dennett’s “intentional stance” in that you can choose to interpret an entity as having a utility function—this is always possible but not necessarily a perspicuous way of understanding an entity—because you might end up with utility functions like “enjoys running in circles but is equally happy being prevented from running in circles”.
The second form is as an explicit component within an AI design. Tool-AIs do not contain such a component—they might have a relevance or accuracy function for evaluating answers, but it’s not a utility function over the world.
Is that a problem so long as some behaviors are preferred over others? You could have “is neutral about running in circles, but resists jumping up and down and prefers making abstract paintings”.
Wouldn’t that depend on the Tool-AI? Eliezer’s default no-akrasia AI does everything it can to fulfill its utility function. You presumably want it to be as accurate as possible or perhaps as accurate as useful. Would it be a problem for it to ask for more resources? To earn money on its own initiative for more resources? To lobby to get laws passed to give it more resources? At some point, it’s a problem if it’s going to try to rule the world to get more resources.....
I think this is explicitly part of the “Tool-AI” definition, that it is not a Utility Maximizer.
I think there’s thorough confusion between utilityA: utility as used in economics to try and predict humans (and predict them inaccurately), and the utilityB: utility as in the model based agent, where the utility is a mathematical function which takes in description of the world and which only refers to real world items if you read stuff into it that is not there and can not be put there.
Viciously maximizing some utilityB leads to, given sufficient capability, the vicious and ohh so dangerous modification of the inputs to utilityB function, i.e. wireheading.
The AIs as we know them, agents or tools, are not utilityA maximizers. We do not know how to make utilityA maximizer. The human intelligence also doesn’t seem to work as utilityA maximizer. It is likely the case that utilityA maximizer is a logical impossibility for agents embedded in the world, or at very least, requires very major advances in formalization of philosophy.
Very interesting and relevant! Can you elaborate or link? I think the case can be made based on Arrow’s theorem and its corollaries, but I’m not sure that’s what you have in mind.
What the hell does SIAI mean by ‘utility function’ anyway? (math please)
Inside the agents and tools as currently implemented, there is a solver that works on a function, and finds input values to that function, which result in maximum (or, usually, minimum) of that function (note that the output may be binary).
[To clarify: that function can include both model of the world and the evaluation of ‘desirability’ of properties of a state of this model. Usually, in software development, if you have f(g(x)) (where g is world predictor and f is the desirability evaluator), and g’s output is only ever used by f, this is a target for optimization to create fg(x) which is more accurate in given time but does not consist of nearly separable parts. Furthermore, the f output is only ever fed to comparison operators, making it another optimization target to create cmp_fg() which compares the actions directly perhaps by calculating the difference between worlds that is caused by particular action, which allows to cull most of processing out]
It, however, is entirely indifferent to actually maximizing anything. It doesn’t even try to maximize some internal variable (it will gladly try inputs that result in small output value, but usually is written not to report those inputs).
I think the confusion arises from defining the agent in English language-based concepts, as opposed to the AI developer’s behaviour where they would define things in some logical down-to-elements way, and then try to communicate it using English. The command in English, ‘bring me the best answer!‘, does tell you to go ahead and convert universe to computronium to answer it (if you are to interpret it in science-fiction-robot-minded way). The commands in programming languages, not really. I don’t think English specifies that either, we just can interpret it charitably enough if we feel like (starting from other purpose, such as ‘be nice’).
edit: I feel that a lot of difficulties of making ‘safe AGI’, those that are not outright nonsensical, are just repackaged special cases of statements about general difficulty of making any AGI, safe or not. That’s very nasty thing to do, to generate such special cases preferentially. edit: Also, some may be special cases of lack/impossibility of solution to symbol grounding.