The latter part, that IF SIAI is exerting a positive influence, THEN doing that outweighs the alternative of not working on existential risks, seems to be a claim somewhat easy to defend.
The math in this Bostrom paper should do it: http://www.nickbostrom.com/astronomical/waste.html (even though the paper is not directly commenting on this particular question, the math rather straightforwardly applies to this question)
Ouch. This paper reads to me like a reductio ad absurdum of utilitarianism. Some simple math inevitably implies that I’m losing an unimaginable amount of “utility” every second without realizing it? Then please remind me why I should care about this “utility”?
Imagine that you have to decide once and for all eternity what to do with the world. You won’t be able to back off, because that would just mean that the world will be rewritten randomly. How should you do that?
This is essentially the situation we find ourselves in, with Friendly AI/existential risk pressure. Formal preference is the answer you give to that question, about what to do with the world, not something that “you have”, or “care about”. Forget intuitions and emotions, or considerations of comfort, and just answer the question. Formal preference is distinct from exact state of the world only because it’s uncertain what can be actually done, and what can’t. So, formal preference specifies what should be done for every level of capability to determine things. Of course, formal preference can’t be given explicitly. To the extent you’ll be able to express the answer to this question, your formal preference is defined by your wishes. Any uncertainty gets taken over by randomness, an opportunity to make the world better lost forever.
For any sane notion of an answer to that question, you’ll find that whatever actually happens now is vastly suboptimal.
If it’s your chosen avenue of research, I guess I’m okay with that, but IMO you’re making the problem way more difficult for yourself. Such “formal preferences” will be much harder to extract from actual humans than utility functions in their original economic sense, because unlike utility, “formal preference” as you define it doesn’t even influence our everyday actions very much.
If it’s your chosen avenue of research, I guess I’m okay with that, but IMO you’re making the problem way more difficult for yourself.
Way more difficult than what? There is no other way to pose this problem, any revealed preference is not what Friendly AI is about. I agree that it’s a way harder problem than automatic extraction of utilities in the economic sense, and that formal preference barely controls what people actually do.
What would be wrong with an AI based on our revealed preferences? It sounds like an easy question, but somehow I’m having a hard time coming up with an answer.
Because my revealed preferences suck. The difference between even what I want in a sort of ordinary and non-transhumanist way and what I have is enormous. I am 150 pounds heavier than I want to be. My revealed preference is to eat regardless of health/size consequences, but I don’t want all of the people in the future to be fat. My revealed preference is also to kill people in pooristan so that I can have cheap plastic widgets or food or whatever. I don’t want an extrapolation of my akrasiatic actual actions controlling the future of the universe. I suspect the same goes for you.
Hmm. Let’s look more closely at the weight example, because the others are similar. You also reveal some degree of preference to be thin rather than fat, do you? Then an AI with unlimited power could satisfy both your desire to eat and your desire to be thin. And if the AI has limited power, do you really want it to starve you, rather than go with your revealed preference?
Revealed preference means what your actual actions are. It doesn’t have anything at all to do with what I verbally say my goals are. I can say that I would prefer to be thin all I want, but that isn’t my revealed preference. My revealed preference is to be fat, because, you know, that’s how I’m acting. You seem to be suffering some misapprehensions as to what you are saying about how an AI should act. If your definition of revealed preference contains my desire not to be fat, you should shift to what I mean when I talk about preference, because yours solves none of the problems you think it does.
I’m assuming that you revealed your preference to be thin in your other actions, at some other moments of your life. Pretty hard to believe that’s not the case.
At this point, I think I can provide a definitive answer to your earlier question, and it is … wait for it … “It depends on what you mean by revealed preference.” (Raise your hand if you saw that one coming! I’ll be here all week, folks!)
Specifically: if the AI is to do the “right thing,” then it has to get its information about “rightness” from somewhere, and given that moral realism is false (or however you want to talk about it), that information is going to have to come from humans, whether by scanning our brains directly or just superintelligently analyzing our behavior. Whether you call this revealed preference or Friendliness doesn’t matter; the technical challenge remains the same.
One argument against using the term revealed preference in this context is that the way the term gets used in economics fails to capture some of the key subtleties of the superintelligence problem. We want the AI to preserve all the things we care about, not just the most conspicuous things. We want it to consider not just that Lucas ate this-and-such, but also that he regretted it afterwards, where it should be stressed that regret is not any less real of a phenomenon than eating is. But because economists often use their models to study big public things like the trade of money for goods and services, in the popular imagination, economic concepts are associated with those kinds of big public things, and not small private things like feeling regretful—even though you could make a case that the underlying decision-theoretic principles are actually general enough to cover everything.
If the math only says to maximize u(x) subject to x dot p equals y, there’s no reason things like ethical concerns or the wish to be a better person can’t be part of the x_i or p_j, but because most people think economics is about money, they’re less likely to realize this when you say revealed preference. They’ll object, “Oh, but what about the time I did this-and-such, but I wish I were the sort of person that did such-and-that?” You could say, “Well, you revealed your preference to do such-and-that in your other actions, at some other moments of your life,” or you could just choose a different word. Again, I’m not sure it matters.
What would be wrong with an AI based on our revealed preferences?
What AI is based on is what determines the way the world will actually be, so by building an AI with given preference, you are inevitably answering my question about what to do with the world. It’s wrong to use revealed preference for AI to the same extent revealed preference gives the wrong answer to my question. You seem to agree that the correct answer to my question has little to do with revealed preference. This seems to be the same as seeing revealed preference a wrong thing to imprint AI with.
It’s not you that’s “losing utility”, it is any agent that has linearly aggregative utility in human lives lived. If you’re not an altruist in this sense, then you don’t care.
No one has ever been an altruist in this crazy sense. No one’s actual wants and desires have ever been adequately represented by this 10^23 stuff. Utility is a model of what people want, not a prescription of what you “should” want (what does “should want” mean anyway?), and here we clearly see the model not modeling what it’s supposed to.
I agree with you to the extent that no one that I am aware of is actually expending the effort that disutilities represented by 10^23 should inspire. But even before the concept of cosmic waste was developed, no one was actually working as hard as, say, starvation in Africa deserved. Or ending aging. Or the threat of nuclear Armageddon. But the fact that humans, who are all affected by akrasia aren’t actually doing what they want isn’t really strong evidence that it isn’t what they, on sufficient reflection, want. Utility is not a model of what non-rational agents (ie humans) are doing, it is a model of how actual, idealized agents want to act. I don’t want people to die, so I should work to reduce existential risk as much as possible, but because I am not a perfect agent, I can’t actually follow the path that really maximizes my (non-existent abstraction of) utility.
No one’s actual wants and desires have ever been adequately represented by this 10^23 stuff.
Can you expand on this? What do you mean by “actual” wants? If someone claims to be motivated by “10^23 stuff”, and acts in accordance with this claim, then what is your account of their “actual wants”?
I haven’t seen anyone who claims to be motivated by utilities of such magnitude except Eliezer. He’s currently busy writing his Harry Potter fanfic and shows no signs of mental distress that the 10^23-strong anticipation should’ve given him.
Now this story has a plot, an arc, and a direction, but it does not have a set pace. What it has are chapters that are fun to write. I started writing this story in part because I’d bogged down on a book I was working on (now debogged), and that means my top priority was to have fun writing again.
The other reason is that Eliezer Yudkowsky showed up here on Monday, seeking people’s help with the rationality book he’s writing. Previously, he wrote a number of immensly high-quality posts in blog format, with the express purpose of turning them into a book later on. But now that he’s been trying to work on the book, he has noticed that without the constant feedback he got from writing blog posts, getting anything written has been very slow. So he came here to see if having people watching him write and providing feedback at the same time would help. He did get some stuff written, and at the end, asked me if I could come over his place on Wednesday. (I’m not entirely sure of why I in particular was picked, but hey.) On Wednesday, me being there helped him break his previous daily record on amount of words written for his book, so I visited again on Friday and agreed to also come back on Monday and Tuesday.
Eliezer is not “busy writing his Harry Potter fanfic.” He is working on his book on rationality.
The latter part, that IF SIAI is exerting a positive influence, THEN doing that outweighs the alternative of not working on existential risks, seems to be a claim somewhat easy to defend.
The math in this Bostrom paper should do it: http://www.nickbostrom.com/astronomical/waste.html (even though the paper is not directly commenting on this particular question, the math rather straightforwardly applies to this question)
Ouch. This paper reads to me like a reductio ad absurdum of utilitarianism. Some simple math inevitably implies that I’m losing an unimaginable amount of “utility” every second without realizing it? Then please remind me why I should care about this “utility”?
Imagine that you have to decide once and for all eternity what to do with the world. You won’t be able to back off, because that would just mean that the world will be rewritten randomly. How should you do that?
This is essentially the situation we find ourselves in, with Friendly AI/existential risk pressure. Formal preference is the answer you give to that question, about what to do with the world, not something that “you have”, or “care about”. Forget intuitions and emotions, or considerations of comfort, and just answer the question. Formal preference is distinct from exact state of the world only because it’s uncertain what can be actually done, and what can’t. So, formal preference specifies what should be done for every level of capability to determine things. Of course, formal preference can’t be given explicitly. To the extent you’ll be able to express the answer to this question, your formal preference is defined by your wishes. Any uncertainty gets taken over by randomness, an opportunity to make the world better lost forever.
For any sane notion of an answer to that question, you’ll find that whatever actually happens now is vastly suboptimal.
If it’s your chosen avenue of research, I guess I’m okay with that, but IMO you’re making the problem way more difficult for yourself. Such “formal preferences” will be much harder to extract from actual humans than utility functions in their original economic sense, because unlike utility, “formal preference” as you define it doesn’t even influence our everyday actions very much.
Way more difficult than what? There is no other way to pose this problem, any revealed preference is not what Friendly AI is about. I agree that it’s a way harder problem than automatic extraction of utilities in the economic sense, and that formal preference barely controls what people actually do.
What would be wrong with an AI based on our revealed preferences? It sounds like an easy question, but somehow I’m having a hard time coming up with an answer.
Because my revealed preferences suck. The difference between even what I want in a sort of ordinary and non-transhumanist way and what I have is enormous. I am 150 pounds heavier than I want to be. My revealed preference is to eat regardless of health/size consequences, but I don’t want all of the people in the future to be fat. My revealed preference is also to kill people in pooristan so that I can have cheap plastic widgets or food or whatever. I don’t want an extrapolation of my akrasiatic actual actions controlling the future of the universe. I suspect the same goes for you.
Hmm. Let’s look more closely at the weight example, because the others are similar. You also reveal some degree of preference to be thin rather than fat, do you? Then an AI with unlimited power could satisfy both your desire to eat and your desire to be thin. And if the AI has limited power, do you really want it to starve you, rather than go with your revealed preference?
Revealed preference means what your actual actions are. It doesn’t have anything at all to do with what I verbally say my goals are. I can say that I would prefer to be thin all I want, but that isn’t my revealed preference. My revealed preference is to be fat, because, you know, that’s how I’m acting. You seem to be suffering some misapprehensions as to what you are saying about how an AI should act. If your definition of revealed preference contains my desire not to be fat, you should shift to what I mean when I talk about preference, because yours solves none of the problems you think it does.
Is your revealed preference to be fat, or is it to eat and exercise (or not exercise) in ways which incidentally result in your being fat?
I’m assuming that you revealed your preference to be thin in your other actions, at some other moments of your life. Pretty hard to believe that’s not the case.
At this point, I think I can provide a definitive answer to your earlier question, and it is … wait for it … “It depends on what you mean by revealed preference.” (Raise your hand if you saw that one coming! I’ll be here all week, folks!)
Specifically: if the AI is to do the “right thing,” then it has to get its information about “rightness” from somewhere, and given that moral realism is false (or however you want to talk about it), that information is going to have to come from humans, whether by scanning our brains directly or just superintelligently analyzing our behavior. Whether you call this revealed preference or Friendliness doesn’t matter; the technical challenge remains the same.
One argument against using the term revealed preference in this context is that the way the term gets used in economics fails to capture some of the key subtleties of the superintelligence problem. We want the AI to preserve all the things we care about, not just the most conspicuous things. We want it to consider not just that Lucas ate this-and-such, but also that he regretted it afterwards, where it should be stressed that regret is not any less real of a phenomenon than eating is. But because economists often use their models to study big public things like the trade of money for goods and services, in the popular imagination, economic concepts are associated with those kinds of big public things, and not small private things like feeling regretful—even though you could make a case that the underlying decision-theoretic principles are actually general enough to cover everything.
If the math only says to maximize u(x) subject to x dot p equals y, there’s no reason things like ethical concerns or the wish to be a better person can’t be part of the x_i or p_j, but because most people think economics is about money, they’re less likely to realize this when you say revealed preference. They’ll object, “Oh, but what about the time I did this-and-such, but I wish I were the sort of person that did such-and-that?” You could say, “Well, you revealed your preference to do such-and-that in your other actions, at some other moments of your life,” or you could just choose a different word. Again, I’m not sure it matters.
What AI is based on is what determines the way the world will actually be, so by building an AI with given preference, you are inevitably answering my question about what to do with the world. It’s wrong to use revealed preference for AI to the same extent revealed preference gives the wrong answer to my question. You seem to agree that the correct answer to my question has little to do with revealed preference. This seems to be the same as seeing revealed preference a wrong thing to imprint AI with.
It’s not you that’s “losing utility”, it is any agent that has linearly aggregative utility in human lives lived. If you’re not an altruist in this sense, then you don’t care.
No one has ever been an altruist in this crazy sense. No one’s actual wants and desires have ever been adequately represented by this 10^23 stuff. Utility is a model of what people want, not a prescription of what you “should” want (what does “should want” mean anyway?), and here we clearly see the model not modeling what it’s supposed to.
I agree with you to the extent that no one that I am aware of is actually expending the effort that disutilities represented by 10^23 should inspire. But even before the concept of cosmic waste was developed, no one was actually working as hard as, say, starvation in Africa deserved. Or ending aging. Or the threat of nuclear Armageddon. But the fact that humans, who are all affected by akrasia aren’t actually doing what they want isn’t really strong evidence that it isn’t what they, on sufficient reflection, want. Utility is not a model of what non-rational agents (ie humans) are doing, it is a model of how actual, idealized agents want to act. I don’t want people to die, so I should work to reduce existential risk as much as possible, but because I am not a perfect agent, I can’t actually follow the path that really maximizes my (non-existent abstraction of) utility.
Can you expand on this? What do you mean by “actual” wants? If someone claims to be motivated by “10^23 stuff”, and acts in accordance with this claim, then what is your account of their “actual wants”?
I haven’t seen anyone who claims to be motivated by utilities of such magnitude except Eliezer. He’s currently busy writing his Harry Potter fanfic and shows no signs of mental distress that the 10^23-strong anticipation should’ve given him.
From the Author’s Note:
From Kaj Sotala:
Eliezer is not “busy writing his Harry Potter fanfic.” He is working on his book on rationality.
The Harry Potter fanfic is a book on rationality. And a damn good one.
To clarify, Eliezer Yudkowsky is working both on a book and on the Harry Potter fanfiction in question. Both pertain to rationality.