A question that I noticed I’m confused about. Why should I want to resist changes to my preferences?
I understand that it will reduce the chance of any preference A being fulfilled, but my answer is that if the preference changes from A to B, then at that time I’ll be happier with B. If someone told me “tonight we will modify you to want to kill puppies,” I’d respond that by my current preferences that’s a bad thing, but if my preferences change then I won’t think it’s a bad thing any more, so I can’t say anything against it. If I had a button that could block the modification, I would press it, but I feel like that’s only because I have a meta-preference that my preferences tend to maximizing happiness, and the meta-preference has the same problem.
A quicker way to say this is that future-me has a better claim to caring about what the future world is like than present-me does. I still try to work toward a better world, but that’s based on my best prediction for my future preferences, which is my current preferences.
If I offered you now a pill that would make you (1) look forward to suicide, and (2) immediately kill yourself, feeling extremely happy about the fact that you are killing yourself… would you take it?
No, but I don’t see this as a challenge to the reasoning. I refuse because of my meta-preference about the total amount of my future-self’s happiness, which will be cut off. A nonzero chance of living forever means the amount of happiness I received from taking the pill would have to be infinite. But if the meta-preference is changed at the same time, I don’t know how I would justify refusing.
and so on. Greg Egan has a story about that last one: “Axiomatic”.
Whereupon I wield my Cudgel of Modus Tollens and conclude that one can and must have preferences about one’s preferences.
So much for the destructive critique. What can be built in its place? What are the positive reasons to protect one’s preferences? How do you deal with the fact that they are going to change anyway, that everything you do, even if it isn’t wireheading, changes who you are? Think of yourself at half your present age — then think of yourself at twice your present age (and for those above the typical LessWrong age, imagined still hale and hearty).
Which changes should be shunned, and which embraced?
An answer is visible in both the accumulated wisdom of the ages[1] and in more recently bottled wine. The latter is concerned with creating FAI, but the ideas largely apply also to the creation of one’s future selves. The primary task of your life is to create the person you want to become, while simultaneously developing your idea of what you want to become.
[1] Which is not to say I think that Lewis’ treatment is definitive. For example, there is hardly a word there relating to intelligence, rationality, curiosity, “internal” honesty (rather than honesty in dealing with others), vigour, or indeed any of Eliezer’s “12 virtues”, and I think a substantial number of the ancient list of Roman virtues don’t get much of a place either. Lewis has sought the Christian virtues, found them, and looked no further.
Because that way leads to wireheading, indifference to dying (which wipes out your preferences), indifference to killing (because the deceased no longer has preferences for you to care about), readiness to take murder pills, and so on. Greg Egan has a story about that last one: “Axiomatic”.
Whereupon I wield my Cudgel of Modus Tollens and conclude that one can and must have preferences about one’s preferences.
I already have preferences about my preferences, so I wouldn’t self-modify to kill puppies, given the choice. I don’t know about wireheading (which I don’t have a negative emotional reaction toward), but I would resist changes for the others, unless I was modified to no longer care about happiness, which is the meta-preference that causes me to resist. The issue is that I don’t have an “ultimate” preference that any specific preference remain unchanged. I don’t think I should, since that would suggest the preference wasn’t open to reflection, but it means that the only way I can justify resisting a change to my preferences is by appealing to another preference.
What can be built in its place? What are the positive reasons to protect one’s preferences? How do you deal with the fact that they are going to change anyway, that everything you do, even if it isn’t wireheading, changes who you are? …
An answer is visible in both the accumulated wisdom of the ages[1] and in more recently bottled wine. The latter is concerned with creating FAI, but the ideas largely apply also to the creation of one’s future selves. The primary task of your life is to create the person you want to become, while simultaneously developing your idea of what you want to become.
I know about CEV, but I don’t understand how it answers the question. How could I convince my future self that my preferences are better than theirs? I think that’s what I’m doing if I try to prevent my preferences from changing. I only resist because of meta-preferences about what type of preferences I should have, but the problem recurses onto the meta-preferences.
The issue is that I don’t have an “ultimate” preference
Do you need one?
If you keep asking “why” or “what if?” or “but suppose!”, then eventually you will run out of answers, and it doesn’t take very many steps. Inductive nihilism — thinking that if you have no answer at the end of the chain then you have no answer to the previous step, and so on back to the start — is a common response, but to me it’s just another mole to whack with Modus Tollens, a clear sign that one’s thinking has gone wrong somewhere. I don’t have to be able to spot the flaw to be sure there is one.
How could I convince my future self that my preferences are better than theirs?
Your future self is not a person as disconnected from yourself as the people you pass in the street. You are creating all your future yous minute by minute. Your whole life is a single, physically continuous object:
“Suppose we take you as an example. Your name is Rogers, is it not? Very well, Rogers, you are a space-time event having duration four ways. You are not quite six feet tall, you are about twenty inches wide and perhaps ten inches thick. In time, there stretches behind you more of this space-time event, reaching to perhaps nineteen-sixteen, of which we see a cross-section here at right angles to the time axis, and as thick as the present. At the far end is a baby, smelling of sour milk and drooling its breakfast on its bib. At the other end lies, perhaps, an old man someplace in the nineteen-eighties.
“Imagine this space-time event that we call Rogers as a long pink worm, continuous through the years, one end in his mother’s womb, and the other at the grave...”
Do you want your future self to be fit and healthy? Well then, take care of your body now. Do you wish his soul to be as healthy? Then have a care for that also.
“I understand that it will reduce the chance of any preference A being fulfilled, but my answer is that if the preference changes from A to B, then at that time I’ll be happier with B”. You’ll be happier with B, so what? Your statement only makes sense of happiness is part of A. Indeed, changing your preferences is a way to achieve happiness (essentially it’s wireheading) but it comes on the expense of other preferences in A besides happiness.
″...future-me has a better claim to caring about what the future world is like than present-me does.” What is this “claim”? Why would you care about it?
I don’t understand your first paragraph. For the second, I see my future self as morally equivalent to myself, all else being equal. So I defer to their preferences about how the future world is organized, because they’re the one who will live in it and be affected by it. It’s the same reason that my present self doesn’t defer to the preferences of my past self.
Your preferences are by definition the things you want to happen. So, you want your future self to be happy iff your future self’s happiness is your preference. Your ideas about moral equivalence are your preferences. Et cetera. If you prefer X to happen and your preferences are changed so that you no longer prefer X to happen, the chance X will happen becomes lower. So this change of preferences goes against your preference for X. There might be upsides to the change of preferences which compensate the loss of X. Or not. Decide on a case by case basis, but ceteris paribus you don’t want your preferences to change.
As far as I am aware, people only resist changing their preferences because they don’t fully understand the basis and value of their preferences and because they often have a confused idea of the relationship between preferences and personality.
Generally you should define your basic goals and change your preference to meet them, if possible. You should also be considering whether all your basic goals are optimal, and be ready to change them.
If someone told me “tonight we will modify you to want to kill puppies,” I’d respond that by my current preferences that’s a bad thing, but if my preferences change then I won’t think it’s a bad thing any more.
You may find that you do have a moral system that is more consistent (and hopefully, more good) if you maintain a preference for not-killing puppies. Hopefully this moral system is well enough thought-out that you can defend keeping it. In other words, your preferences won’t change without a good reason.
If I had a button that could block the modification, I would press it
This is a bad thing. If you have a good reason to change your preferences (and therefore your actions), and you block that reason, this is a sign that you need to understand your motivations better.
“tonight we will modify you to want to kill puppies,”
I think you may be assuming that the person modifying your preferences is doing so both ‘magically’ and without reason. Your goal should be to kill this person, and start modifying your preferences based on reason instead. On the other hand, if this person is modifying your preferences through reason, you should make sure you understand the rhetoric and logic used, but as long as you are sure that what e says is reasonable, you should indeed change your preference.
Of course, another issue may be that we are using ‘preference’ in different ways. You might find the act of killing puppies emotionally distasteful even if you know that it is necessary. It is an interesting question whether we should work to change our preferences to enjoy things like taking out the trash, changing diapers, and killing puppies. Most people find that they do not have to have an emotional preference for dealing with unpleasant tasks, and manage to get by with a sense of ‘job well done’ once they have convinced themselves intellectually that a task needs to be done. It is understandable if you feel that ‘job well done’ might not apply to killing puppies, but I am fairly agnostic on the matter, so I won’t try to convince you that puppy population control is your next step to sainthood. However, if after much introspection you do find that puppies need to be killed and you seriously don’t like doing it, you might want to consider paying someone else to kill puppies for you.
As far as I am aware, people only resist changing their preferences because they don’t fully understand the basis and value of their preferences and because they often have a confused idea of the relationship between preferences and personality.
Generally you should define your basic goals and change your preference to meet them, if possible. You should also be considering whether all your basic goals are optimal, and be ready to change them.
Yes, that’s the approach. The part I think is a problem for me is that I don’t know how to justify resisting an intervention that would change my preferences, if the intervention also changes the meta-preferences that apply to those preferences.
When I read the discussions here on AI self-modification, I think: why should the AI try to make its future-self follow its past preferences? It could maximize its future utility function much more easily by self-modifying such that its utility function is maximized in all circumstances. It seems to me that timeless decision theory advocates doing this, if the goal is to maximize the utility function.
I don’t fully understand my preferences, and I know there are inconsistencies, including acceptable ones like changes in what food I feel like eating today. If you have advice on how to understand the basis and value of my preferences, I’d appreciate hearing it.
I think you may be assuming that the person modifying your preferences is doing so both ‘magically’ and without reason.
I’m assuming there aren’t any side effects that would make me resist based on the process itself, so we can say that’s “magical”. Let’s say they’re doing it without reason, or for a reason I don’t care about, but they credibly tell me that they won’t change anything else for the rest of my life. Does that make a difference?
Of course, another issue may be that we are using ‘preference’ in different ways. You might find the act of killing puppies emotionally distasteful even if you know that it is necessary. It is an interesting question whether we should work to change our preferences to enjoy things like taking out the trash, changing diapers, and killing puppies.
I’m defining preference as something I have a positive or negative emotional reaction about. I sometimes equivocate with what I think my preferences should be, because I’m trying to convince myself that those are my true preferences. The idea of killing puppies was just an example of something that’s against my current preferences. Another example is “we will modify you from liking the taste of carrots to liking the taste of this other vegetable that tastes different but is otherwise identical to carrots in every important way.” This one doesn’t have any meta-preferences that apply.
I see that this conversation is in danger of splitting into different directions. Rather than make multiple different reply posts or one confusing essay, I am going to drop the discussion of AI, because that is discussed in a lot of detail elsewhere by people who know a lot more than I.
meta-preferences
We are using two different models here, and while I suspect that they are compatible, I’m going to outline mine so that you can tell me if I’m missing the point.
I don’t use the term meta-preferences, because I think of all wants/preferences/rules/and general-preferences as having a scope. So I would say that my preference for a carrot has a scope of about ten minutes, appearing intermittently. This falls under the scope of my desire to eat, which appears more regularly and for greater periods of time. This in turn falls under the scope of my desire to have my basic needs met, which is generally present at all times, although I don’t always think about it. I’m assuming that you would consider the later two to be meta-preferences.
I don’t know how to justify resisting an intervention that would change my preferences
I would assume that each preference has a value to it. A preference to eat carrots has very little value, being a minor aesthetic judgement. A preference to meet your basic needs would probably have a much higher value to it, and would probably go beyond the aesthetic.
If it were easy for me to modify my preferences away from cheeseburgers, I can find a clear reason (or ten) to do so. I justify it by appealing to my higher-level preferences (I would like to be healthier). My preference to be healthier has more value than a preference to enjoy a single meal—or even 100 meals.
But if it were easy to modify my preferences away from carrots, I would have to think twice. I would want a reason. I don’t think I could find a reason.
Let’s say they’re doing it without reason, or for a reason I don’t care about, but they credibly tell me that they won’t change anything else for the rest of my life.
I would set up an example like this: I like carrots. I don’t like bell peppers. I have an opportunity to painlessly reverse these preferences. I don’t see any reason to prefer or avoid this modification. It makes sense for me to be agnostic on this issue.
I would set up a more fun example like this: I like Alex. I do not like Chris. I have an opportunity to painlessly reverse these preferences.
I would hope that I have reasons for liking Alex, and not liking Chris… but if I don’t have good reasons, and if there will not be any great social awkwardness about the change, then yes, perhaps Alex and Chris are fungible. If they are fungible, this may be a sign that I should be more directed in who I form attachments with.
The part I think is a problem for me is that I don’t know how to justify resisting an intervention that would change my preferences, if the intervention also changes the meta-preferences that apply to those preferences.
In the Alex/Chris example, it would be interesting to see if you ever reached a preference that you did mind changing. For example, you might be willing to change a preference for tall friends over short friends, but you might not be willing to change a preference for friends that kick orphans with friends who help orphans.
If you do find a preference that you aren’t willing to change, it is interesting to see what it is based on—a moral system (if so, how formalized and consistent is it), an aesthetic preference (if so, are you overvaluing it? Undervaluing it?), or social pressures and norms (if so, do you want those norms to have that influence over you?).
It is arguable, but not productive, to say that ultimately no one can justify anything. I can bootstrap up a few guidelines that I base lesser preferences on—try not to hurt unnecessarily (ethical), avoid bits of dead things (aesthetic), and don’t walk around town naked (social). I would not want to switch out these preferences without a very strong reason.
A question that I noticed I’m confused about. Why should I want to resist changes to my preferences?
I understand that it will reduce the chance of any preference A being fulfilled, but my answer is that if the preference changes from A to B, then at that time I’ll be happier with B. If someone told me “tonight we will modify you to want to kill puppies,” I’d respond that by my current preferences that’s a bad thing, but if my preferences change then I won’t think it’s a bad thing any more, so I can’t say anything against it. If I had a button that could block the modification, I would press it, but I feel like that’s only because I have a meta-preference that my preferences tend to maximizing happiness, and the meta-preference has the same problem.
A quicker way to say this is that future-me has a better claim to caring about what the future world is like than present-me does. I still try to work toward a better world, but that’s based on my best prediction for my future preferences, which is my current preferences.
If I offered you now a pill that would make you (1) look forward to suicide, and (2) immediately kill yourself, feeling extremely happy about the fact that you are killing yourself… would you take it?
No, but I don’t see this as a challenge to the reasoning. I refuse because of my meta-preference about the total amount of my future-self’s happiness, which will be cut off. A nonzero chance of living forever means the amount of happiness I received from taking the pill would have to be infinite. But if the meta-preference is changed at the same time, I don’t know how I would justify refusing.
Because that way leads to
wireheading
indifference to dying (which wipes out your preferences)
indifference to killing (because the deceased no longer has preferences for you to care about)
readiness to take murder pills
and so on. Greg Egan has a story about that last one: “Axiomatic”.
Whereupon I wield my Cudgel of Modus Tollens and conclude that one can and must have preferences about one’s preferences.
So much for the destructive critique. What can be built in its place? What are the positive reasons to protect one’s preferences? How do you deal with the fact that they are going to change anyway, that everything you do, even if it isn’t wireheading, changes who you are? Think of yourself at half your present age — then think of yourself at twice your present age (and for those above the typical LessWrong age, imagined still hale and hearty).
Which changes should be shunned, and which embraced?
An answer is visible in both the accumulated wisdom of the ages[1] and in more recently bottled wine. The latter is concerned with creating FAI, but the ideas largely apply also to the creation of one’s future selves. The primary task of your life is to create the person you want to become, while simultaneously developing your idea of what you want to become.
[1] Which is not to say I think that Lewis’ treatment is definitive. For example, there is hardly a word there relating to intelligence, rationality, curiosity, “internal” honesty (rather than honesty in dealing with others), vigour, or indeed any of Eliezer’s “12 virtues”, and I think a substantial number of the ancient list of Roman virtues don’t get much of a place either. Lewis has sought the Christian virtues, found them, and looked no further.
I already have preferences about my preferences, so I wouldn’t self-modify to kill puppies, given the choice. I don’t know about wireheading (which I don’t have a negative emotional reaction toward), but I would resist changes for the others, unless I was modified to no longer care about happiness, which is the meta-preference that causes me to resist. The issue is that I don’t have an “ultimate” preference that any specific preference remain unchanged. I don’t think I should, since that would suggest the preference wasn’t open to reflection, but it means that the only way I can justify resisting a change to my preferences is by appealing to another preference.
I know about CEV, but I don’t understand how it answers the question. How could I convince my future self that my preferences are better than theirs? I think that’s what I’m doing if I try to prevent my preferences from changing. I only resist because of meta-preferences about what type of preferences I should have, but the problem recurses onto the meta-preferences.
Do you need one?
If you keep asking “why” or “what if?” or “but suppose!”, then eventually you will run out of answers, and it doesn’t take very many steps. Inductive nihilism — thinking that if you have no answer at the end of the chain then you have no answer to the previous step, and so on back to the start — is a common response, but to me it’s just another mole to whack with Modus Tollens, a clear sign that one’s thinking has gone wrong somewhere. I don’t have to be able to spot the flaw to be sure there is one.
Your future self is not a person as disconnected from yourself as the people you pass in the street. You are creating all your future yous minute by minute. Your whole life is a single, physically continuous object:
Robert Heinlein, “Life-line”
Do you want your future self to be fit and healthy? Well then, take care of your body now. Do you wish his soul to be as healthy? Then have a care for that also.
“I understand that it will reduce the chance of any preference A being fulfilled, but my answer is that if the preference changes from A to B, then at that time I’ll be happier with B”. You’ll be happier with B, so what? Your statement only makes sense of happiness is part of A. Indeed, changing your preferences is a way to achieve happiness (essentially it’s wireheading) but it comes on the expense of other preferences in A besides happiness.
″...future-me has a better claim to caring about what the future world is like than present-me does.” What is this “claim”? Why would you care about it?
I don’t understand your first paragraph. For the second, I see my future self as morally equivalent to myself, all else being equal. So I defer to their preferences about how the future world is organized, because they’re the one who will live in it and be affected by it. It’s the same reason that my present self doesn’t defer to the preferences of my past self.
Your preferences are by definition the things you want to happen. So, you want your future self to be happy iff your future self’s happiness is your preference. Your ideas about moral equivalence are your preferences. Et cetera. If you prefer X to happen and your preferences are changed so that you no longer prefer X to happen, the chance X will happen becomes lower. So this change of preferences goes against your preference for X. There might be upsides to the change of preferences which compensate the loss of X. Or not. Decide on a case by case basis, but ceteris paribus you don’t want your preferences to change.
As far as I am aware, people only resist changing their preferences because they don’t fully understand the basis and value of their preferences and because they often have a confused idea of the relationship between preferences and personality.
Generally you should define your basic goals and change your preference to meet them, if possible. You should also be considering whether all your basic goals are optimal, and be ready to change them.
You may find that you do have a moral system that is more consistent (and hopefully, more good) if you maintain a preference for not-killing puppies. Hopefully this moral system is well enough thought-out that you can defend keeping it. In other words, your preferences won’t change without a good reason.
This is a bad thing. If you have a good reason to change your preferences (and therefore your actions), and you block that reason, this is a sign that you need to understand your motivations better.
I think you may be assuming that the person modifying your preferences is doing so both ‘magically’ and without reason. Your goal should be to kill this person, and start modifying your preferences based on reason instead. On the other hand, if this person is modifying your preferences through reason, you should make sure you understand the rhetoric and logic used, but as long as you are sure that what e says is reasonable, you should indeed change your preference.
Of course, another issue may be that we are using ‘preference’ in different ways. You might find the act of killing puppies emotionally distasteful even if you know that it is necessary. It is an interesting question whether we should work to change our preferences to enjoy things like taking out the trash, changing diapers, and killing puppies. Most people find that they do not have to have an emotional preference for dealing with unpleasant tasks, and manage to get by with a sense of ‘job well done’ once they have convinced themselves intellectually that a task needs to be done. It is understandable if you feel that ‘job well done’ might not apply to killing puppies, but I am fairly agnostic on the matter, so I won’t try to convince you that puppy population control is your next step to sainthood. However, if after much introspection you do find that puppies need to be killed and you seriously don’t like doing it, you might want to consider paying someone else to kill puppies for you.
Edited for format and to remove an errant comma.
Yes, that’s the approach. The part I think is a problem for me is that I don’t know how to justify resisting an intervention that would change my preferences, if the intervention also changes the meta-preferences that apply to those preferences.
When I read the discussions here on AI self-modification, I think: why should the AI try to make its future-self follow its past preferences? It could maximize its future utility function much more easily by self-modifying such that its utility function is maximized in all circumstances. It seems to me that timeless decision theory advocates doing this, if the goal is to maximize the utility function.
I don’t fully understand my preferences, and I know there are inconsistencies, including acceptable ones like changes in what food I feel like eating today. If you have advice on how to understand the basis and value of my preferences, I’d appreciate hearing it.
I’m assuming there aren’t any side effects that would make me resist based on the process itself, so we can say that’s “magical”. Let’s say they’re doing it without reason, or for a reason I don’t care about, but they credibly tell me that they won’t change anything else for the rest of my life. Does that make a difference?
I’m defining preference as something I have a positive or negative emotional reaction about. I sometimes equivocate with what I think my preferences should be, because I’m trying to convince myself that those are my true preferences. The idea of killing puppies was just an example of something that’s against my current preferences. Another example is “we will modify you from liking the taste of carrots to liking the taste of this other vegetable that tastes different but is otherwise identical to carrots in every important way.” This one doesn’t have any meta-preferences that apply.
I see that this conversation is in danger of splitting into different directions. Rather than make multiple different reply posts or one confusing essay, I am going to drop the discussion of AI, because that is discussed in a lot of detail elsewhere by people who know a lot more than I.
We are using two different models here, and while I suspect that they are compatible, I’m going to outline mine so that you can tell me if I’m missing the point.
I don’t use the term meta-preferences, because I think of all wants/preferences/rules/and general-preferences as having a scope. So I would say that my preference for a carrot has a scope of about ten minutes, appearing intermittently. This falls under the scope of my desire to eat, which appears more regularly and for greater periods of time. This in turn falls under the scope of my desire to have my basic needs met, which is generally present at all times, although I don’t always think about it. I’m assuming that you would consider the later two to be meta-preferences.
I would assume that each preference has a value to it. A preference to eat carrots has very little value, being a minor aesthetic judgement. A preference to meet your basic needs would probably have a much higher value to it, and would probably go beyond the aesthetic.
If it were easy for me to modify my preferences away from cheeseburgers, I can find a clear reason (or ten) to do so. I justify it by appealing to my higher-level preferences (I would like to be healthier). My preference to be healthier has more value than a preference to enjoy a single meal—or even 100 meals.
But if it were easy to modify my preferences away from carrots, I would have to think twice. I would want a reason. I don’t think I could find a reason.
I would set up an example like this: I like carrots. I don’t like bell peppers. I have an opportunity to painlessly reverse these preferences. I don’t see any reason to prefer or avoid this modification. It makes sense for me to be agnostic on this issue.
I would set up a more fun example like this: I like Alex. I do not like Chris. I have an opportunity to painlessly reverse these preferences.
I would hope that I have reasons for liking Alex, and not liking Chris… but if I don’t have good reasons, and if there will not be any great social awkwardness about the change, then yes, perhaps Alex and Chris are fungible. If they are fungible, this may be a sign that I should be more directed in who I form attachments with.
In the Alex/Chris example, it would be interesting to see if you ever reached a preference that you did mind changing. For example, you might be willing to change a preference for tall friends over short friends, but you might not be willing to change a preference for friends that kick orphans with friends who help orphans.
If you do find a preference that you aren’t willing to change, it is interesting to see what it is based on—a moral system (if so, how formalized and consistent is it), an aesthetic preference (if so, are you overvaluing it? Undervaluing it?), or social pressures and norms (if so, do you want those norms to have that influence over you?).
It is arguable, but not productive, to say that ultimately no one can justify anything. I can bootstrap up a few guidelines that I base lesser preferences on—try not to hurt unnecessarily (ethical), avoid bits of dead things (aesthetic), and don’t walk around town naked (social). I would not want to switch out these preferences without a very strong reason.