“Does my current utility function contain a global maximum at the case where I wish for and receive death right now?” is a really scary question to ask a genie.
I would prefer “I wish for the world to changed in such a manner as my present actual utility function, (explicitly distinct from my perception of my utility function), is at the global maximum possible without altering my present actual utility function.”
Or in colloquial language “Give me what I want, not what I think I want.”
That sounds pretty scary too. I don’t think I am close enough to being an agent to have a well-defined utility function. If I do (paradoxical as it sounds), it’s probably not something I would reflectively like. For example, I think I have more empathy for things I am sexually attracted to. But the idea of a world where everyone else (excluding me and a few people I really like) is a second-class citizen to hot babes horrifies me. But with the wrong kind of extrapolation, I bet that could be said to be what I want.
I can’t easily describe any procedure I know I would like for getting a utility function out of me. If I or some simulated copy of me remained to actually be deciding things, I think I could get things I would not only like, but like and like liking. Especially if I can change myself from an insane ape who wishes it was a rationalist, to an actual rationalist through explicitly specified modifications guided by wished-for knowledge.
The best way I can think of to ensure that the extrapolated utility function is something like whatever is used in making my decisions, is to just use the brain circuits I already have that do that the way I like.
I also think a good idea might be to have a crowd of backup copies of me. One of us would try making some self-modifications in a sandboxed universe where their wishes could not get outside, and then the others would vote on whether to keep them.
Well, you don’t prefer a world “where everyone else (excluding me and a few people I really like) is a second-class citizen to hot babes horrifies me.” to the current world. If you can express such a judgement preferring one universe over another, and those judgements are transitive, you have a utility function.
And if you want to like liking them, that is also part of your utility function.
One confounding factor that you do bring up- the domain of one’s utility function really doesn’t need to include things outside the realm of possibility.
Well if deep down I wish that hot babes were given higher status in society than ugly females and males I didn’t know (I’ll roll with this example even though with omnipotence I could probably do things that would make this a nonissue) and I wish that I didn’t wish it, and the genie maximizes my inferred utility function once (not maximizing it once and then maximizing the new one that doing so replaces it with), we would end up with a world where hot babes had higher status and I wished they didn’t. The only thing I can see saving us from this would be if I also wanted my wants to be fulfilled, whatever they were. But then we would end up with me assigning Flimple utility to 2 + 2 being equal to 4.
I understand the argument “If you preferred the world ruled by hot babes it wouldn’t horrify you,” but what if that’s just because when I imagine things at a global scale they turn featureless, and appearances don’t matter to me anymore, but when I see the people in person they still do? What if, being able to see and socialize with every hot babe in the world changed my mind about whether I would want them to have higher social status, even if I was also able to see and socialize with everyone else?
What if the appearance of transitivity is only from drawing analogies from the last scenario I thought about (A) to the next one (B), but if I started a chain of analogies from somewhere else (C) my view on B would be completely different, such that you could take exchange rates of certain things I considered good in each scenario and construct a cycle of preference?
There’s a difference between finding a global maximum of a function and finding the local derivative and moving infinity towards a local maximum.
Even if your actual utility function were for one group to have higher status, that does not imply that the greatest utility comes at the highers imbalance.
And if your preferences aren’t transitive, you don’t have a valid utility function. If you find yourself in a cycle of preference, you have probably failed to accurately judge two or more things which are hard to compare.
The problem isn’t that I might see a hot babe and care about her more than other people, but that when the numbers are bigger I care about everyone the same amount. It’s that it probably dependent on things like whether i have seen them, and how much I know about them. If I was told by a perfectly reliable source that X is a hot babe by my standards, that would not make me care about her more than other people. But it would if I saw her. So what I want is not just dependent on what I believe, but on what I experience. On some figurative level, I’m a Mealy machine, not a Moore machine.
If you find yourself in a cycle of preference, you have probably failed to accurately judge two or more things which are hard to compare.
When I prefer A to B, B to C, and C to A (a cycle of preference), then I typically find that there is something moral in the general sense about A, something moral in the direct sense about B, and something which is personally advantageous about C.
For instance, I would rather my donations to charity have a larger total effect, which in the current world means donating to the best aggregator rather to individual causes. I would rather donate to individual causes than ignore them in selfish self-interest. I would rather spend my money in my own self interest than redistribute it in order to achieve maximum benefits. I believe that the reason I think my preferences lie this way is that I am unable to judge the value of diversified charity compared to selfish behavior.
In short, I am extrapolating from the single data point I have. I am assuming that there is one overwhelmingly likely reason for making that type of error, in which case it is likely that mine is for that reason and that yours is as well.
“Does my current utility function contain a global maximum at the case where I wish for and receive death right now?” is a really scary question to ask a genie.
I would prefer “I wish for the world to changed in such a manner as my present actual utility function, (explicitly distinct from my perception of my utility function), is at the global maximum possible without altering my present actual utility function.”
Or in colloquial language “Give me what I want, not what I think I want.”
That sounds pretty scary too. I don’t think I am close enough to being an agent to have a well-defined utility function. If I do (paradoxical as it sounds), it’s probably not something I would reflectively like. For example, I think I have more empathy for things I am sexually attracted to. But the idea of a world where everyone else (excluding me and a few people I really like) is a second-class citizen to hot babes horrifies me. But with the wrong kind of extrapolation, I bet that could be said to be what I want.
I can’t easily describe any procedure I know I would like for getting a utility function out of me. If I or some simulated copy of me remained to actually be deciding things, I think I could get things I would not only like, but like and like liking. Especially if I can change myself from an insane ape who wishes it was a rationalist, to an actual rationalist through explicitly specified modifications guided by wished-for knowledge.
The best way I can think of to ensure that the extrapolated utility function is something like whatever is used in making my decisions, is to just use the brain circuits I already have that do that the way I like.
I also think a good idea might be to have a crowd of backup copies of me. One of us would try making some self-modifications in a sandboxed universe where their wishes could not get outside, and then the others would vote on whether to keep them.
Well, you don’t prefer a world “where everyone else (excluding me and a few people I really like) is a second-class citizen to hot babes horrifies me.” to the current world. If you can express such a judgement preferring one universe over another, and those judgements are transitive, you have a utility function.
And if you want to like liking them, that is also part of your utility function.
One confounding factor that you do bring up- the domain of one’s utility function really doesn’t need to include things outside the realm of possibility.
Well if deep down I wish that hot babes were given higher status in society than ugly females and males I didn’t know (I’ll roll with this example even though with omnipotence I could probably do things that would make this a nonissue) and I wish that I didn’t wish it, and the genie maximizes my inferred utility function once (not maximizing it once and then maximizing the new one that doing so replaces it with), we would end up with a world where hot babes had higher status and I wished they didn’t. The only thing I can see saving us from this would be if I also wanted my wants to be fulfilled, whatever they were. But then we would end up with me assigning Flimple utility to 2 + 2 being equal to 4.
I understand the argument “If you preferred the world ruled by hot babes it wouldn’t horrify you,” but what if that’s just because when I imagine things at a global scale they turn featureless, and appearances don’t matter to me anymore, but when I see the people in person they still do? What if, being able to see and socialize with every hot babe in the world changed my mind about whether I would want them to have higher social status, even if I was also able to see and socialize with everyone else?
What if the appearance of transitivity is only from drawing analogies from the last scenario I thought about (A) to the next one (B), but if I started a chain of analogies from somewhere else (C) my view on B would be completely different, such that you could take exchange rates of certain things I considered good in each scenario and construct a cycle of preference?
There’s a difference between finding a global maximum of a function and finding the local derivative and moving infinity towards a local maximum.
Even if your actual utility function were for one group to have higher status, that does not imply that the greatest utility comes at the highers imbalance.
And if your preferences aren’t transitive, you don’t have a valid utility function. If you find yourself in a cycle of preference, you have probably failed to accurately judge two or more things which are hard to compare.
The problem isn’t that I might see a hot babe and care about her more than other people, but that when the numbers are bigger I care about everyone the same amount. It’s that it probably dependent on things like whether i have seen them, and how much I know about them. If I was told by a perfectly reliable source that X is a hot babe by my standards, that would not make me care about her more than other people. But it would if I saw her. So what I want is not just dependent on what I believe, but on what I experience. On some figurative level, I’m a Mealy machine, not a Moore machine.
Why do you think this?
When I prefer A to B, B to C, and C to A (a cycle of preference), then I typically find that there is something moral in the general sense about A, something moral in the direct sense about B, and something which is personally advantageous about C.
For instance, I would rather my donations to charity have a larger total effect, which in the current world means donating to the best aggregator rather to individual causes. I would rather donate to individual causes than ignore them in selfish self-interest. I would rather spend my money in my own self interest than redistribute it in order to achieve maximum benefits. I believe that the reason I think my preferences lie this way is that I am unable to judge the value of diversified charity compared to selfish behavior.
In short, I am extrapolating from the single data point I have. I am assuming that there is one overwhelmingly likely reason for making that type of error, in which case it is likely that mine is for that reason and that yours is as well.