As far as I am aware, people only resist changing their preferences because they don’t fully understand the basis and value of their preferences and because they often have a confused idea of the relationship between preferences and personality.
Generally you should define your basic goals and change your preference to meet them, if possible. You should also be considering whether all your basic goals are optimal, and be ready to change them.
If someone told me “tonight we will modify you to want to kill puppies,” I’d respond that by my current preferences that’s a bad thing, but if my preferences change then I won’t think it’s a bad thing any more.
You may find that you do have a moral system that is more consistent (and hopefully, more good) if you maintain a preference for not-killing puppies. Hopefully this moral system is well enough thought-out that you can defend keeping it. In other words, your preferences won’t change without a good reason.
If I had a button that could block the modification, I would press it
This is a bad thing. If you have a good reason to change your preferences (and therefore your actions), and you block that reason, this is a sign that you need to understand your motivations better.
“tonight we will modify you to want to kill puppies,”
I think you may be assuming that the person modifying your preferences is doing so both ‘magically’ and without reason. Your goal should be to kill this person, and start modifying your preferences based on reason instead. On the other hand, if this person is modifying your preferences through reason, you should make sure you understand the rhetoric and logic used, but as long as you are sure that what e says is reasonable, you should indeed change your preference.
Of course, another issue may be that we are using ‘preference’ in different ways. You might find the act of killing puppies emotionally distasteful even if you know that it is necessary. It is an interesting question whether we should work to change our preferences to enjoy things like taking out the trash, changing diapers, and killing puppies. Most people find that they do not have to have an emotional preference for dealing with unpleasant tasks, and manage to get by with a sense of ‘job well done’ once they have convinced themselves intellectually that a task needs to be done. It is understandable if you feel that ‘job well done’ might not apply to killing puppies, but I am fairly agnostic on the matter, so I won’t try to convince you that puppy population control is your next step to sainthood. However, if after much introspection you do find that puppies need to be killed and you seriously don’t like doing it, you might want to consider paying someone else to kill puppies for you.
As far as I am aware, people only resist changing their preferences because they don’t fully understand the basis and value of their preferences and because they often have a confused idea of the relationship between preferences and personality.
Generally you should define your basic goals and change your preference to meet them, if possible. You should also be considering whether all your basic goals are optimal, and be ready to change them.
Yes, that’s the approach. The part I think is a problem for me is that I don’t know how to justify resisting an intervention that would change my preferences, if the intervention also changes the meta-preferences that apply to those preferences.
When I read the discussions here on AI self-modification, I think: why should the AI try to make its future-self follow its past preferences? It could maximize its future utility function much more easily by self-modifying such that its utility function is maximized in all circumstances. It seems to me that timeless decision theory advocates doing this, if the goal is to maximize the utility function.
I don’t fully understand my preferences, and I know there are inconsistencies, including acceptable ones like changes in what food I feel like eating today. If you have advice on how to understand the basis and value of my preferences, I’d appreciate hearing it.
I think you may be assuming that the person modifying your preferences is doing so both ‘magically’ and without reason.
I’m assuming there aren’t any side effects that would make me resist based on the process itself, so we can say that’s “magical”. Let’s say they’re doing it without reason, or for a reason I don’t care about, but they credibly tell me that they won’t change anything else for the rest of my life. Does that make a difference?
Of course, another issue may be that we are using ‘preference’ in different ways. You might find the act of killing puppies emotionally distasteful even if you know that it is necessary. It is an interesting question whether we should work to change our preferences to enjoy things like taking out the trash, changing diapers, and killing puppies.
I’m defining preference as something I have a positive or negative emotional reaction about. I sometimes equivocate with what I think my preferences should be, because I’m trying to convince myself that those are my true preferences. The idea of killing puppies was just an example of something that’s against my current preferences. Another example is “we will modify you from liking the taste of carrots to liking the taste of this other vegetable that tastes different but is otherwise identical to carrots in every important way.” This one doesn’t have any meta-preferences that apply.
I see that this conversation is in danger of splitting into different directions. Rather than make multiple different reply posts or one confusing essay, I am going to drop the discussion of AI, because that is discussed in a lot of detail elsewhere by people who know a lot more than I.
meta-preferences
We are using two different models here, and while I suspect that they are compatible, I’m going to outline mine so that you can tell me if I’m missing the point.
I don’t use the term meta-preferences, because I think of all wants/preferences/rules/and general-preferences as having a scope. So I would say that my preference for a carrot has a scope of about ten minutes, appearing intermittently. This falls under the scope of my desire to eat, which appears more regularly and for greater periods of time. This in turn falls under the scope of my desire to have my basic needs met, which is generally present at all times, although I don’t always think about it. I’m assuming that you would consider the later two to be meta-preferences.
I don’t know how to justify resisting an intervention that would change my preferences
I would assume that each preference has a value to it. A preference to eat carrots has very little value, being a minor aesthetic judgement. A preference to meet your basic needs would probably have a much higher value to it, and would probably go beyond the aesthetic.
If it were easy for me to modify my preferences away from cheeseburgers, I can find a clear reason (or ten) to do so. I justify it by appealing to my higher-level preferences (I would like to be healthier). My preference to be healthier has more value than a preference to enjoy a single meal—or even 100 meals.
But if it were easy to modify my preferences away from carrots, I would have to think twice. I would want a reason. I don’t think I could find a reason.
Let’s say they’re doing it without reason, or for a reason I don’t care about, but they credibly tell me that they won’t change anything else for the rest of my life.
I would set up an example like this: I like carrots. I don’t like bell peppers. I have an opportunity to painlessly reverse these preferences. I don’t see any reason to prefer or avoid this modification. It makes sense for me to be agnostic on this issue.
I would set up a more fun example like this: I like Alex. I do not like Chris. I have an opportunity to painlessly reverse these preferences.
I would hope that I have reasons for liking Alex, and not liking Chris… but if I don’t have good reasons, and if there will not be any great social awkwardness about the change, then yes, perhaps Alex and Chris are fungible. If they are fungible, this may be a sign that I should be more directed in who I form attachments with.
The part I think is a problem for me is that I don’t know how to justify resisting an intervention that would change my preferences, if the intervention also changes the meta-preferences that apply to those preferences.
In the Alex/Chris example, it would be interesting to see if you ever reached a preference that you did mind changing. For example, you might be willing to change a preference for tall friends over short friends, but you might not be willing to change a preference for friends that kick orphans with friends who help orphans.
If you do find a preference that you aren’t willing to change, it is interesting to see what it is based on—a moral system (if so, how formalized and consistent is it), an aesthetic preference (if so, are you overvaluing it? Undervaluing it?), or social pressures and norms (if so, do you want those norms to have that influence over you?).
It is arguable, but not productive, to say that ultimately no one can justify anything. I can bootstrap up a few guidelines that I base lesser preferences on—try not to hurt unnecessarily (ethical), avoid bits of dead things (aesthetic), and don’t walk around town naked (social). I would not want to switch out these preferences without a very strong reason.
As far as I am aware, people only resist changing their preferences because they don’t fully understand the basis and value of their preferences and because they often have a confused idea of the relationship between preferences and personality.
Generally you should define your basic goals and change your preference to meet them, if possible. You should also be considering whether all your basic goals are optimal, and be ready to change them.
You may find that you do have a moral system that is more consistent (and hopefully, more good) if you maintain a preference for not-killing puppies. Hopefully this moral system is well enough thought-out that you can defend keeping it. In other words, your preferences won’t change without a good reason.
This is a bad thing. If you have a good reason to change your preferences (and therefore your actions), and you block that reason, this is a sign that you need to understand your motivations better.
I think you may be assuming that the person modifying your preferences is doing so both ‘magically’ and without reason. Your goal should be to kill this person, and start modifying your preferences based on reason instead. On the other hand, if this person is modifying your preferences through reason, you should make sure you understand the rhetoric and logic used, but as long as you are sure that what e says is reasonable, you should indeed change your preference.
Of course, another issue may be that we are using ‘preference’ in different ways. You might find the act of killing puppies emotionally distasteful even if you know that it is necessary. It is an interesting question whether we should work to change our preferences to enjoy things like taking out the trash, changing diapers, and killing puppies. Most people find that they do not have to have an emotional preference for dealing with unpleasant tasks, and manage to get by with a sense of ‘job well done’ once they have convinced themselves intellectually that a task needs to be done. It is understandable if you feel that ‘job well done’ might not apply to killing puppies, but I am fairly agnostic on the matter, so I won’t try to convince you that puppy population control is your next step to sainthood. However, if after much introspection you do find that puppies need to be killed and you seriously don’t like doing it, you might want to consider paying someone else to kill puppies for you.
Edited for format and to remove an errant comma.
Yes, that’s the approach. The part I think is a problem for me is that I don’t know how to justify resisting an intervention that would change my preferences, if the intervention also changes the meta-preferences that apply to those preferences.
When I read the discussions here on AI self-modification, I think: why should the AI try to make its future-self follow its past preferences? It could maximize its future utility function much more easily by self-modifying such that its utility function is maximized in all circumstances. It seems to me that timeless decision theory advocates doing this, if the goal is to maximize the utility function.
I don’t fully understand my preferences, and I know there are inconsistencies, including acceptable ones like changes in what food I feel like eating today. If you have advice on how to understand the basis and value of my preferences, I’d appreciate hearing it.
I’m assuming there aren’t any side effects that would make me resist based on the process itself, so we can say that’s “magical”. Let’s say they’re doing it without reason, or for a reason I don’t care about, but they credibly tell me that they won’t change anything else for the rest of my life. Does that make a difference?
I’m defining preference as something I have a positive or negative emotional reaction about. I sometimes equivocate with what I think my preferences should be, because I’m trying to convince myself that those are my true preferences. The idea of killing puppies was just an example of something that’s against my current preferences. Another example is “we will modify you from liking the taste of carrots to liking the taste of this other vegetable that tastes different but is otherwise identical to carrots in every important way.” This one doesn’t have any meta-preferences that apply.
I see that this conversation is in danger of splitting into different directions. Rather than make multiple different reply posts or one confusing essay, I am going to drop the discussion of AI, because that is discussed in a lot of detail elsewhere by people who know a lot more than I.
We are using two different models here, and while I suspect that they are compatible, I’m going to outline mine so that you can tell me if I’m missing the point.
I don’t use the term meta-preferences, because I think of all wants/preferences/rules/and general-preferences as having a scope. So I would say that my preference for a carrot has a scope of about ten minutes, appearing intermittently. This falls under the scope of my desire to eat, which appears more regularly and for greater periods of time. This in turn falls under the scope of my desire to have my basic needs met, which is generally present at all times, although I don’t always think about it. I’m assuming that you would consider the later two to be meta-preferences.
I would assume that each preference has a value to it. A preference to eat carrots has very little value, being a minor aesthetic judgement. A preference to meet your basic needs would probably have a much higher value to it, and would probably go beyond the aesthetic.
If it were easy for me to modify my preferences away from cheeseburgers, I can find a clear reason (or ten) to do so. I justify it by appealing to my higher-level preferences (I would like to be healthier). My preference to be healthier has more value than a preference to enjoy a single meal—or even 100 meals.
But if it were easy to modify my preferences away from carrots, I would have to think twice. I would want a reason. I don’t think I could find a reason.
I would set up an example like this: I like carrots. I don’t like bell peppers. I have an opportunity to painlessly reverse these preferences. I don’t see any reason to prefer or avoid this modification. It makes sense for me to be agnostic on this issue.
I would set up a more fun example like this: I like Alex. I do not like Chris. I have an opportunity to painlessly reverse these preferences.
I would hope that I have reasons for liking Alex, and not liking Chris… but if I don’t have good reasons, and if there will not be any great social awkwardness about the change, then yes, perhaps Alex and Chris are fungible. If they are fungible, this may be a sign that I should be more directed in who I form attachments with.
In the Alex/Chris example, it would be interesting to see if you ever reached a preference that you did mind changing. For example, you might be willing to change a preference for tall friends over short friends, but you might not be willing to change a preference for friends that kick orphans with friends who help orphans.
If you do find a preference that you aren’t willing to change, it is interesting to see what it is based on—a moral system (if so, how formalized and consistent is it), an aesthetic preference (if so, are you overvaluing it? Undervaluing it?), or social pressures and norms (if so, do you want those norms to have that influence over you?).
It is arguable, but not productive, to say that ultimately no one can justify anything. I can bootstrap up a few guidelines that I base lesser preferences on—try not to hurt unnecessarily (ethical), avoid bits of dead things (aesthetic), and don’t walk around town naked (social). I would not want to switch out these preferences without a very strong reason.