1) We want the AI to be able to learn and grow in power, and make decisions about its own structure and behavior without our input. We want it to be able to change.
2) we want the AI to fundamentally do the things we prefer.
This is the the basic dichotomy: How do you make an AI that modifies itself, but only in ways that don’t make it hurt you? This is WHY we talk about hard-coding in moral codes. And part of the reason they would be “hard-coded” and thus unmodifiable is because we do not want to take the risk of the AI deciding something we don’t like is morally correct and implementing it on us. But anything made by humans to be unmodifiable by the AI runs the risk of being messed up by the humans writing it. And this is the reason why we should be worried about an AI with a poorly made utility function: Because the utility function is the exact part of of the AI people would be most tempted to force the AI not to ever question.
i agree with the sentiment behind what you say here.
The difficult part is to shake ourselves free of any unexamined, implicit assumptions that we might be bringing to the table, when we talk about the problem.
For example, when you say:
And this is the reason why we should be worried about an AI with a poorly made utility function
… you are talking in terms of an AI that actually HAS such a thing as a “utility function”. And it gets worse: the idea of a “utility function” has enormous implications for how the entire control mechanism (the motivations and goals system) is designed.
A good deal of this debate about my paper is centered in a clash of paradigms: on the one side a group of people who cannot even imagine the existence of any control mechanism except a utility-function-based goal stack, and on the other side me and a pretty large community of real AI builders who consider a utility-function-based goal stack to be so unworkable that it will never be used in any real AI.
Other AI builders that I have talked to (including all of the ones who turned up for the AAAI symposium where this paper was delivered, a year ago) are unequivocal: they say that a utility-function-and-goal-stack approach is something they wouldn’t dream of using in a real AI system. To them, that idea is just a piece of hypothetical silliness put into AI papers by academics who do not build actual AI systems.
And for my part, I am an AI builder with 25 years experience, who was already rejecting that approach in the mid-1980s, and right now I am working on mechanisms that only have vague echoes of that design in them.
Meanwhile, there are very few people in the world who also work on real AGI system design (they are a tiny subset of the “AI builders” I referred to earlier), and of the four others that I know (Ben Goertzel, Peter Voss, Monica Anderson and Phil Goetz) I can say for sure that the first three all completely accept the logic in this paper. (Phil’s work I know less about: he stays off the social radar most of the time, but he’s a member of LW so someone could ask his opinion).
me and a pretty large community of real AI builders who consider a utility-function-based goal stack to be so unworkable that it will never be used in any real AI.
Just because the programmer doesn’t explicitly code a utility function does not mean that there is no utility function. It just means that they don’t know what the utility function is.
Although technically any AI has a utility function, the usual arguments about the failings of utility functions don’t apply to unusual utility functions like the type that may be more easily described using other paradigms.
For instance, Google Maps can be thought of as having a utility function: it gains higher utility the shorter the distance is on the map. However, arguments such as “you can’t exactly specify what you want it to do, so it might blackmail the president into building a road in order to reduce the map distance” aren’t going to work, because you can program Google Maps in such a way that it never does that sort of thing.
However, arguments such as “you can’t exactly specify what you want it to do, so it might blackmail the president into building a road in order to reduce the map distance”
The reason that such arguments do not work is that you can specify exactly what it is you want to do, and the programmers did specify exactly that.
In more complex cases, where the programmers are unable to specify exactly what they want, you do get unexpected results that can be thought of as “the program wasn’t optimizing what the programmers thought it should be optimizing, but only a (crude) approximation thereof”.
(an even better example would be one where a genetic algorithm used in circuit design unexpectedly re-purposed some circuit elements to build an antenna, but I cannot find that reference right now)
The reason that such arguments do not work is that you can specify exactly what it is you want to do, and the programmers did specify exactly that.
Which is part of my point. Because you can specify exactly what you want—and because you can’t for the kinds of utility functions that are usually discussed on LW—describing it as having a utility function is technically true, but is misleading because the things you say about those other utility functions won’t carry over to it. Yeah, just because the programmer didn’t explicitly code a utility function doesn’t mean it doesn’t have one—but it often does mean that it doesn’t have one to which your other conclusions about utility functions apply.
Could you describe some of the other motivation systems for AI that are under discussion? I imagine they might be complicated, but is it possible to explain them to someone not part of the AI building community?
AFAIK most people build planning engines that use multiple goals, plus what you might call “ad hoc” machinery to check on that engine. So in other words, you might have something that comes up with a plan but then a whole bunch of stuff that analyses the plan.
My own approach is very different. Coming up with a plan is not a linear process, but involves large numbers of constraints acting in parallel. If you know about how a neural net goes from a large array of inputs (e.g. a visual field) to smaller numbers of hidden units that encode more and more abstract descriptions of the input, until finally you get some high level node being activated …. then if you picture that process happening in reverse, with a few nodes being highly activated, then causing more and more low level nodes to come up, that gives a rough idea of how it works.
In practice all that the above means is that the maximum possible quantity of contextual information acts on the evolving plan. And that is critical.
Is there one dominant paradigm for AI motivation control in this group that’s competing with utility functions, or do each of the people you mention have different thoughts on it?
People have different thoughts, but to tell the truth most people I know are working on a stage of the AGI puzzle that is well short of the stage where they need to think about the motivation system.
For people (like robot builders) who have to sort that out right now, they used old fashioned planning systems combined with all kinds of bespoke machinery in and around that.
I am not sure, but I think I am the one thinking most about these issues just because I do everything in a weird order, because I am reconstructing human cognition.
This is the the basic dichotomy: How do you make an AI that modifies itself, but only in ways that don’t make it hurt you? This is WHY we talk about hard-coding in moral codes
(Correct) hardcoding is one answer, corrigibility another, reflective self correction another....
1) We want the AI to be able to learn and grow in power, and make decisions about its own structure and behavior without our input. We want it to be able to change.
2) we want the AI to fundamentally do the things we prefer.
This is the the basic dichotomy: How do you make an AI that modifies itself, but only in ways that don’t make it hurt you? This is WHY we talk about hard-coding in moral codes. And part of the reason they would be “hard-coded” and thus unmodifiable is because we do not want to take the risk of the AI deciding something we don’t like is morally correct and implementing it on us. But anything made by humans to be unmodifiable by the AI runs the risk of being messed up by the humans writing it. And this is the reason why we should be worried about an AI with a poorly made utility function: Because the utility function is the exact part of of the AI people would be most tempted to force the AI not to ever question.
i agree with the sentiment behind what you say here.
The difficult part is to shake ourselves free of any unexamined, implicit assumptions that we might be bringing to the table, when we talk about the problem.
For example, when you say:
… you are talking in terms of an AI that actually HAS such a thing as a “utility function”. And it gets worse: the idea of a “utility function” has enormous implications for how the entire control mechanism (the motivations and goals system) is designed.
A good deal of this debate about my paper is centered in a clash of paradigms: on the one side a group of people who cannot even imagine the existence of any control mechanism except a utility-function-based goal stack, and on the other side me and a pretty large community of real AI builders who consider a utility-function-based goal stack to be so unworkable that it will never be used in any real AI.
Other AI builders that I have talked to (including all of the ones who turned up for the AAAI symposium where this paper was delivered, a year ago) are unequivocal: they say that a utility-function-and-goal-stack approach is something they wouldn’t dream of using in a real AI system. To them, that idea is just a piece of hypothetical silliness put into AI papers by academics who do not build actual AI systems.
And for my part, I am an AI builder with 25 years experience, who was already rejecting that approach in the mid-1980s, and right now I am working on mechanisms that only have vague echoes of that design in them.
Meanwhile, there are very few people in the world who also work on real AGI system design (they are a tiny subset of the “AI builders” I referred to earlier), and of the four others that I know (Ben Goertzel, Peter Voss, Monica Anderson and Phil Goetz) I can say for sure that the first three all completely accept the logic in this paper. (Phil’s work I know less about: he stays off the social radar most of the time, but he’s a member of LW so someone could ask his opinion).
Just because the programmer doesn’t explicitly code a utility function does not mean that there is no utility function. It just means that they don’t know what the utility function is.
Although technically any AI has a utility function, the usual arguments about the failings of utility functions don’t apply to unusual utility functions like the type that may be more easily described using other paradigms.
For instance, Google Maps can be thought of as having a utility function: it gains higher utility the shorter the distance is on the map. However, arguments such as “you can’t exactly specify what you want it to do, so it might blackmail the president into building a road in order to reduce the map distance” aren’t going to work, because you can program Google Maps in such a way that it never does that sort of thing.
The reason that such arguments do not work is that you can specify exactly what it is you want to do, and the programmers did specify exactly that.
In more complex cases, where the programmers are unable to specify exactly what they want, you do get unexpected results that can be thought of as “the program wasn’t optimizing what the programmers thought it should be optimizing, but only a (crude) approximation thereof”. (an even better example would be one where a genetic algorithm used in circuit design unexpectedly re-purposed some circuit elements to build an antenna, but I cannot find that reference right now)
Which is part of my point. Because you can specify exactly what you want—and because you can’t for the kinds of utility functions that are usually discussed on LW—describing it as having a utility function is technically true, but is misleading because the things you say about those other utility functions won’t carry over to it. Yeah, just because the programmer didn’t explicitly code a utility function doesn’t mean it doesn’t have one—but it often does mean that it doesn’t have one to which your other conclusions about utility functions apply.
Thanks for pointing thus out, a lot of people seem confused on the issue, (What’s worse, its largely a map/territory confusion)
Why would you expect an AI to obey the vNM axioms at all, unless it was designed to?
Not true, except in a trivial sense.
Could you describe some of the other motivation systems for AI that are under discussion? I imagine they might be complicated, but is it possible to explain them to someone not part of the AI building community?
AFAIK most people build planning engines that use multiple goals, plus what you might call “ad hoc” machinery to check on that engine. So in other words, you might have something that comes up with a plan but then a whole bunch of stuff that analyses the plan.
My own approach is very different. Coming up with a plan is not a linear process, but involves large numbers of constraints acting in parallel. If you know about how a neural net goes from a large array of inputs (e.g. a visual field) to smaller numbers of hidden units that encode more and more abstract descriptions of the input, until finally you get some high level node being activated …. then if you picture that process happening in reverse, with a few nodes being highly activated, then causing more and more low level nodes to come up, that gives a rough idea of how it works.
In practice all that the above means is that the maximum possible quantity of contextual information acts on the evolving plan. And that is critical.
Is there one dominant paradigm for AI motivation control in this group that’s competing with utility functions, or do each of the people you mention have different thoughts on it?
People have different thoughts, but to tell the truth most people I know are working on a stage of the AGI puzzle that is well short of the stage where they need to think about the motivation system.
For people (like robot builders) who have to sort that out right now, they used old fashioned planning systems combined with all kinds of bespoke machinery in and around that.
I am not sure, but I think I am the one thinking most about these issues just because I do everything in a weird order, because I am reconstructing human cognition.
(Correct) hardcoding is one answer, corrigibility another, reflective self correction another....