What makes us think that AI would stick with the utility function they’re given?
There are very few situations in which an agent can most effectively maximise expected utility according to their current utility function by modifying themselves to have a different utility function. Unless the AI is defective or put in a specially contrived scenario it will maintain its current utility function because that is an instrumentally useful thing to do.
If you are a paperclip maximiser then becoming a staples maximiser is a terribly inefficient strategy for maximising paperclips unless Omega is around making weird bargains.
I change my utility function all the time, sometimes on purpose.
No you don’t. That is, to the extent that you “change your utility function” at all you do not have a utility function in sense meant when discussing AI. It only makes sense to model humans as having ‘utility functions’ when they are behaving in a manner that can be vaguely approximated as expected utility maximisers with a particular preference function.
Sure, it is possible to implement AIs that aren’t expected utility maximisers either and those AIs could be made to do all sorts of arbitrary things including fundamentally change their goals and behavioral strategies. But if you implement an AI that tries to maximise a utility function then it will (almost always) keep trying to maximise that same utility function.
For humans, the value of some outcome is a point in multidimensional value space, whose axes include things like pleasure, love, freedom, anti-suffering, and etc. There is no easy way to compare points at different coordinates. Human values are complex.
For a being with a utility function, it has a way to take any outcome and put a scalar value on it, such that different outcomes can be compared.
We don’t have anything like that. We can adjust how much we value any one dimension in value space, even discover new dimensions! But we aren’t utility maximizers.
Which raises the question—if we want to create AI that respect human values, then why would we make utility maximizer AI in the first place?
I’m still not sold on the idea that an intelligent being would slavishly follow its utility function. For AI, there are no questions about the meaning of life then? Just keep on U maximizing?
I’m still not sold on the idea that an intelligent being would slavishly follow its utility function.
If it’s really your utility function, you’re not following it “slavishly”—it is just what you want to do.
For AI, there are no questions about the meaning of life then? Just keep on U maximizing?
If “questions about the meaning of life” maximize utility, then yes, there are those. Can you unpack what “questions about the meaning of life” are supposed to be, and why you think they’re important? (‘meaning of “life”’ is fairly easy, and ‘meaning of life’ seems like a category error).
Sorry, “meaning of life” is sloppy phrasing. “What is the meaning of life?” is popular shorthand for “what is worth doing? what is worth pursuing?”. It is asking about what is ultimately valuable, and how it relates to how I choose to live.
It’s interesting that we are imagining AIs to be immune from this. It is a common human obsession (though maybe only among unhappy humans?). An AI isn’t distracted by contradictory values like a human is then, it never has to make hard choices? No choices at all really, just the output of the argmax expected utility function?
I can’t speak for anyone else, but I expect that a sufficiently well designed intelligence, faced with hard choices, makes them. If an intelligence is designed in such a way that, when faced with hard choices, it fails to make them (as happens to humans a lot), I consider that a design failure.
And yes, I expect that it makes them in such a way as to maximize the expected value of its choice.… that is, so as to insofar as possible do what is worth doing and pursue what is worth pursuing. Which presumes that at any given moment it will at least have a working belief about what is worth doing and worth pursuing.
If an intelligence is designed in such a way that it can’t make a choice because it doesn’t know what it’s trying to achieve by choosing (that is, it doesn’t know what it values), I again consider that a design failure. (Again, this happens to humans a lot.)
I can’t speak for anyone else, but I expect that a sufficiently well designed intelligence, faced with hard choices, makes them. If an intelligence is designed in such a way that, when faced with hard choices, it fails to make them (as happens to humans a lot), I consider that a design failure.
The level of executive function required of normal people to function in modern society is astonishingly high by historical standards. It’s not surprising that people have a lot of “above my pay grade” reactions to difficult decisions, and that decision-making ability is highly variable among people.
I have an enormous amount of sympathy for us humans, who are required to make these kinds of decisions with nothing but our brains. My sympathy increased radically during the period of my life when, due to traumatic brain injury, my level of executive function was highly impaired and ordering lunch became an “above my pay grade” decision. We really do astonishingly well, for what we are.
But none of that changes my belief that we aren’t especially well designed for making hard choices.
It’s also not surprising that people can’t fly across the Atlantic Ocean. But I expect a sufficiently well designed aircraft to do so.
It’s interesting that we view those who do make the tough decisions as virtuous—i.e. the commander in a war movie (I’m thinking of Bill Adama). We recognize that it is a hard but valuable thing to do!
Sure. For much of human history, the basic decision-making unit has been the household, rather than the individual, and household sizes have decreased significantly as time has gone on. With the “three generations under one roof” model, individuals could heed the sage wisdom of someone who has lived several times as long as they have when making important decisions like what career to follow or who to marry, and in many cases the social pressure to conform to the wishes of the elders was significant. As well, many people were also considered property- and so didn’t need to make decisions that would alter the course of their life, because someone else would make them for them. Serfs rarely needed to make complicated financial decisions. Limited mobility made deciding where to live easier.
Now, individuals (of both sexes!) are expected to decide who to marry and what job to pursue, mostly on their own. The replacement for the apprentice system- high school and college- provide little structure compared to traditional apprenticeships. Individuals are expected to negotiate for themselves with regards to many complicated financial transactions and be stewards of property.
(This is a good thing in general, but it is worth remembering that it’s a great thing for people who are good at being executives and mediocre to bad for people who are bad at it. As well, varying family types have been a thing for a long time, which may have had an impact on the development of societies and selected for different traits.)
A common problem that faces humans is that they often have to choose between two different things that they value (such as freedom vs. equality), without an obvious way to make a numerical comparison between the two. How many freeons equal one egaliton? It’s certainly inconvenient, but the complexity of value is a fundamentally human feature.
It seems to me that it will be very hard to come up with utility functions for fAI that capture all the things that humans find valuable in life. The topology of the systems don’t match up.
Is this a design failure? I’m not so sure. I’m not sold on the desirability of having an easily computable value function.
I would agree that we’re often in positions where we’re forced to choose between two things that we value and we just don’t know how to make that choice.
Sometimes, as you say, it’s because we don’t know how to compare the two. (Talk of numerical comparison is, I think, beside the point.) Sometimes it’s because we can’t accept giving up something of value, even in exchange for something of greater value. Sometimes it’s for other reasons.
I would agree that coming up with a way to evaluate possible states of the world that take into account all of the things humans value is very difficult. This is true whether the evaluation is by means of a utility function for fAI or via some other means. It’s a hard problem.
I would agree that replacing the hard-to-compute value function(s) I actually have with some other value function(s) that are easier to compute is not desirable.
Building an automated system that can compute the hard-to-compute value function(s) I actually have more reliably than my brain can—for example, a system that can evaluate various possible states of the world and predict which ones would actually make me satisfied and fulfilled to live in, and be right more often than I am—sounds pretty desirable to me. I have no more desire to make that calculation with my brain, given better alternatives, than I have to calculate square roots of seven-digit numbers with it.
There are very few situations in which an agent can most effectively maximise expected utility according to their current utility function by modifying themselves to have a different utility function. Unless the AI is defective or put in a specially contrived scenario it will maintain its current utility function because that is an instrumentally useful thing to do.
If you are a paperclip maximiser then becoming a staples maximiser is a terribly inefficient strategy for maximising paperclips unless Omega is around making weird bargains.
No you don’t. That is, to the extent that you “change your utility function” at all you do not have a utility function in sense meant when discussing AI. It only makes sense to model humans as having ‘utility functions’ when they are behaving in a manner that can be vaguely approximated as expected utility maximisers with a particular preference function.
Sure, it is possible to implement AIs that aren’t expected utility maximisers either and those AIs could be made to do all sorts of arbitrary things including fundamentally change their goals and behavioral strategies. But if you implement an AI that tries to maximise a utility function then it will (almost always) keep trying to maximise that same utility function.
Would does not imply could.
Let me see if I understand what you’re saying.
For humans, the value of some outcome is a point in multidimensional value space, whose axes include things like pleasure, love, freedom, anti-suffering, and etc. There is no easy way to compare points at different coordinates. Human values are complex.
For a being with a utility function, it has a way to take any outcome and put a scalar value on it, such that different outcomes can be compared.
We don’t have anything like that. We can adjust how much we value any one dimension in value space, even discover new dimensions! But we aren’t utility maximizers.
Which raises the question—if we want to create AI that respect human values, then why would we make utility maximizer AI in the first place?
I’m still not sold on the idea that an intelligent being would slavishly follow its utility function. For AI, there are no questions about the meaning of life then? Just keep on U maximizing?
If it’s really your utility function, you’re not following it “slavishly”—it is just what you want to do.
If “questions about the meaning of life” maximize utility, then yes, there are those. Can you unpack what “questions about the meaning of life” are supposed to be, and why you think they’re important? (‘meaning of “life”’ is fairly easy, and ‘meaning of life’ seems like a category error).
Sorry, “meaning of life” is sloppy phrasing. “What is the meaning of life?” is popular shorthand for “what is worth doing? what is worth pursuing?”. It is asking about what is ultimately valuable, and how it relates to how I choose to live.
It’s interesting that we are imagining AIs to be immune from this. It is a common human obsession (though maybe only among unhappy humans?). An AI isn’t distracted by contradictory values like a human is then, it never has to make hard choices? No choices at all really, just the output of the argmax expected utility function?
I can’t speak for anyone else, but I expect that a sufficiently well designed intelligence, faced with hard choices, makes them. If an intelligence is designed in such a way that, when faced with hard choices, it fails to make them (as happens to humans a lot), I consider that a design failure.
And yes, I expect that it makes them in such a way as to maximize the expected value of its choice.… that is, so as to insofar as possible do what is worth doing and pursue what is worth pursuing. Which presumes that at any given moment it will at least have a working belief about what is worth doing and worth pursuing.
If an intelligence is designed in such a way that it can’t make a choice because it doesn’t know what it’s trying to achieve by choosing (that is, it doesn’t know what it values), I again consider that a design failure. (Again, this happens to humans a lot.)
The level of executive function required of normal people to function in modern society is astonishingly high by historical standards. It’s not surprising that people have a lot of “above my pay grade” reactions to difficult decisions, and that decision-making ability is highly variable among people.
100% agreed.
I have an enormous amount of sympathy for us humans, who are required to make these kinds of decisions with nothing but our brains. My sympathy increased radically during the period of my life when, due to traumatic brain injury, my level of executive function was highly impaired and ordering lunch became an “above my pay grade” decision. We really do astonishingly well, for what we are.
But none of that changes my belief that we aren’t especially well designed for making hard choices.
It’s also not surprising that people can’t fly across the Atlantic Ocean. But I expect a sufficiently well designed aircraft to do so.
It’s interesting that we view those who do make the tough decisions as virtuous—i.e. the commander in a war movie (I’m thinking of Bill Adama). We recognize that it is a hard but valuable thing to do!
Could you elaborate on this?
Sure. For much of human history, the basic decision-making unit has been the household, rather than the individual, and household sizes have decreased significantly as time has gone on. With the “three generations under one roof” model, individuals could heed the sage wisdom of someone who has lived several times as long as they have when making important decisions like what career to follow or who to marry, and in many cases the social pressure to conform to the wishes of the elders was significant. As well, many people were also considered property- and so didn’t need to make decisions that would alter the course of their life, because someone else would make them for them. Serfs rarely needed to make complicated financial decisions. Limited mobility made deciding where to live easier.
Now, individuals (of both sexes!) are expected to decide who to marry and what job to pursue, mostly on their own. The replacement for the apprentice system- high school and college- provide little structure compared to traditional apprenticeships. Individuals are expected to negotiate for themselves with regards to many complicated financial transactions and be stewards of property.
(This is a good thing in general, but it is worth remembering that it’s a great thing for people who are good at being executives and mediocre to bad for people who are bad at it. As well, varying family types have been a thing for a long time, which may have had an impact on the development of societies and selected for different traits.)
A common problem that faces humans is that they often have to choose between two different things that they value (such as freedom vs. equality), without an obvious way to make a numerical comparison between the two. How many freeons equal one egaliton? It’s certainly inconvenient, but the complexity of value is a fundamentally human feature.
It seems to me that it will be very hard to come up with utility functions for fAI that capture all the things that humans find valuable in life. The topology of the systems don’t match up.
Is this a design failure? I’m not so sure. I’m not sold on the desirability of having an easily computable value function.
I would agree that we’re often in positions where we’re forced to choose between two things that we value and we just don’t know how to make that choice.
Sometimes, as you say, it’s because we don’t know how to compare the two. (Talk of numerical comparison is, I think, beside the point.)
Sometimes it’s because we can’t accept giving up something of value, even in exchange for something of greater value.
Sometimes it’s for other reasons.
I would agree that coming up with a way to evaluate possible states of the world that take into account all of the things humans value is very difficult. This is true whether the evaluation is by means of a utility function for fAI or via some other means. It’s a hard problem.
I would agree that replacing the hard-to-compute value function(s) I actually have with some other value function(s) that are easier to compute is not desirable.
Building an automated system that can compute the hard-to-compute value function(s) I actually have more reliably than my brain can—for example, a system that can evaluate various possible states of the world and predict which ones would actually make me satisfied and fulfilled to live in, and be right more often than I am—sounds pretty desirable to me. I have no more desire to make that calculation with my brain, given better alternatives, than I have to calculate square roots of seven-digit numbers with it.
Upvoted for use of the phrase “How many freeons equal one egaliton?”