Tonight, I am going to sneak into your house and rewire your brain so that you will become hell-bent on mass murder.
Now, I suspect this won’t lead you to say, “Oh, well my uitlity function is going to change, so I should make sure to buy lots of knives today, when I don’t look insane, so that it will be easier for my future self to satisfy his homicidal urges.” Surely what we’d want to say is, “That’s awful, I must make sure that I tell someone so that they’ll be able to stop me!”
I think it’s pretty clear that what you care about is what you care about now. It may be the case that one of the things you (currently) care about is that your future desires be fulfilled, even if there’s some variance from what you now care about. But that’s just one thing you care about, and you almost certainly care about people not getting stabbed to death more than that.
When thinking about future people, in particular, I think one thing a lot of us care about is that they have their preferences satisfied. That’s a very general desire; it could be that future people will want to do nothing but paint eggs. If so, I might be a bit disappointed, but I still think we should try and enable that. However, if future people just wanted to torture innocent people all the time, then that would not be OK. The potential suffering far outweighs the satisfaction of their preferences.
This sort of pattern just fits the case where future people’s utility (including that of your future self) is just one among others of the things that you care about right now. Obviously you have more reason to try and bring it about if you think that future people will be aiming at things that you also care about, but they’re logically separate things.
If I considered it high-probability that you could make a change and you were claiming you’d make a change that wouldn’t be be of highly negative utility to everyone else, I might well prepare for that change.
Because your proposed change is highly negative to everyone else, I might well attempt to resist or counteract that change.
Why does that make sense, though? Why do other peoples’ current utility functions count if mine don’t? How does that extend to a situation where you changed everyone else? How does it extend to a situation where I could change everyone else but I don’t have to? If an AI programmed to make its programmer happy does so by directly changing the programmer’s brain to provide a constant mental state of happiness, why is that a bad thing?
The way I’m thinking about it is that other people’s utility functions count (for you, now) because you care about them. There isn’t some universal magic register of things that “count”; there’s just your utility function which lives in your head (near enough). If you fundamentally don’t care about other people’s utility, and there’s no instrumental reason for you to do so, then there’s no way I can persuade you to start caring.
So it’s not so much that caring about other people’s utility “makes sense”, just that you do care about it. Whether the AI is doing a bad thing (from the point of view of the programmer) depends on what the programmer actually cares about. If he wants to climb Mount Everest, then being told that he will be rewired to enjoy just lying on a sofa doesn’t lead to him doing so. He might also care about the happiness of his future self, but it could be that his desire to climb Mount Everest overwhelms that.
You’re saying that present-me’s utility function counts and no-one else’s does (apart from their position in present-me’s function) because present-me is the one making the decision? That my choices must necessarily depend on my present function and only depend on other/future functions in how much I care about their happiness?
That seems reasonable.
But my current utility function tells me that there is an N large enough that N utilon-seconds for other peoples’ functions counts more in my function than any possible thing in the expected lifespan of present-me’s utility function.
Sure. That might well be so. I’m not saying you have to be selfish!
However, you’re talking about utilons for other people—but I doubt that that’s the only thing you care about. I would kind of like for Clippy to get his utilons, but in the process, the world will get turned into paperclips, and I care much more about that not happening! So if everyone were to be turned into paperclip maximizers, I wouldn’t necessarily roll over and say, “Alright, turn the world into paperclips”. Maybe if there were enough of them, I’d be OK with it, as there’s only one world to lose, but it would have to be an awful lot!
I’d consider it. On reflection, I think that for me personally what I care about isn’t just minds of any kind having their preferences satisfied, even if those are harmless ones. I think I probably would like them to have more adventurous preferences! The point is, what I’m looking at here are my preferences for how the world should be; whether I would prefer a world full of wire-headers or one full of people doing awesome actual stuff. I think I’d prefer the latter, even if overall the adventurous people didnt’ get as many of their preferences satisfied. A typical wire-header would probably disagree, though!
Tonight, I am going to sneak into your house and rewire your brain so that you will become hell-bent on mass murder.
Now, I suspect this won’t lead you to say, “Oh, well my uitlity function is going to change, so I should make sure to buy lots of knives today, when I don’t look insane, so that it will be easier for my future self to satisfy his homicidal urges.” Surely what we’d want to say is, “That’s awful, I must make sure that I tell someone so that they’ll be able to stop me!”
I think it’s pretty clear that what you care about is what you care about now. It may be the case that one of the things you (currently) care about is that your future desires be fulfilled, even if there’s some variance from what you now care about. But that’s just one thing you care about, and you almost certainly care about people not getting stabbed to death more than that.
When thinking about future people, in particular, I think one thing a lot of us care about is that they have their preferences satisfied. That’s a very general desire; it could be that future people will want to do nothing but paint eggs. If so, I might be a bit disappointed, but I still think we should try and enable that. However, if future people just wanted to torture innocent people all the time, then that would not be OK. The potential suffering far outweighs the satisfaction of their preferences.
This sort of pattern just fits the case where future people’s utility (including that of your future self) is just one among others of the things that you care about right now. Obviously you have more reason to try and bring it about if you think that future people will be aiming at things that you also care about, but they’re logically separate things.
If I considered it high-probability that you could make a change and you were claiming you’d make a change that wouldn’t be be of highly negative utility to everyone else, I might well prepare for that change. Because your proposed change is highly negative to everyone else, I might well attempt to resist or counteract that change. Why does that make sense, though? Why do other peoples’ current utility functions count if mine don’t? How does that extend to a situation where you changed everyone else? How does it extend to a situation where I could change everyone else but I don’t have to? If an AI programmed to make its programmer happy does so by directly changing the programmer’s brain to provide a constant mental state of happiness, why is that a bad thing?
The way I’m thinking about it is that other people’s utility functions count (for you, now) because you care about them. There isn’t some universal magic register of things that “count”; there’s just your utility function which lives in your head (near enough). If you fundamentally don’t care about other people’s utility, and there’s no instrumental reason for you to do so, then there’s no way I can persuade you to start caring.
So it’s not so much that caring about other people’s utility “makes sense”, just that you do care about it. Whether the AI is doing a bad thing (from the point of view of the programmer) depends on what the programmer actually cares about. If he wants to climb Mount Everest, then being told that he will be rewired to enjoy just lying on a sofa doesn’t lead to him doing so. He might also care about the happiness of his future self, but it could be that his desire to climb Mount Everest overwhelms that.
You’re saying that present-me’s utility function counts and no-one else’s does (apart from their position in present-me’s function) because present-me is the one making the decision? That my choices must necessarily depend on my present function and only depend on other/future functions in how much I care about their happiness? That seems reasonable. But my current utility function tells me that there is an N large enough that N utilon-seconds for other peoples’ functions counts more in my function than any possible thing in the expected lifespan of present-me’s utility function.
Sure. That might well be so. I’m not saying you have to be selfish!
However, you’re talking about utilons for other people—but I doubt that that’s the only thing you care about. I would kind of like for Clippy to get his utilons, but in the process, the world will get turned into paperclips, and I care much more about that not happening! So if everyone were to be turned into paperclip maximizers, I wouldn’t necessarily roll over and say, “Alright, turn the world into paperclips”. Maybe if there were enough of them, I’d be OK with it, as there’s only one world to lose, but it would have to be an awful lot!
So you, like I, might consider turning the universe into minds that most value a universe filled with themselves?
I’d consider it. On reflection, I think that for me personally what I care about isn’t just minds of any kind having their preferences satisfied, even if those are harmless ones. I think I probably would like them to have more adventurous preferences! The point is, what I’m looking at here are my preferences for how the world should be; whether I would prefer a world full of wire-headers or one full of people doing awesome actual stuff. I think I’d prefer the latter, even if overall the adventurous people didnt’ get as many of their preferences satisfied. A typical wire-header would probably disagree, though!
Fair.