Because I would wish to kill myself eventually? It’s hard to imagine that I would do that, faced with unlimited wishes. If I got bored, I could just wish the boredom away.
Though on reflection this wish needs a safeguard against infinite recursion, and a bliss-universe for any simulated copies of me the genie creates to determine what my wishes would be.
In a sufficiently bad situation, you may wish for the genie to kill you because you think that’s your only wish. It’s not likely for any given wish, but would happen eventually (and ends the recursion, so that’s one of the few stable wishes).
If I kill myself, there is no nth wish as n → infinity, or a busy beaver function of Graham’s numberth wish, so the first wish is wishing for something undefined.
Also, the probability of any of the individually improbable events where I kill myself happening is bounded above by the some of their probabilities, and they could be a convergent infinite series, if the probability of wanting to kill myself went down each time. Even though I stipulated that it’s if I believed each wish was the last, I might do something like “except don’t grant this wish if it would result in me wanting to kill myself or dying before I could consider the question” in each hypothetical wish. Or grant myself superintelligence as part of one of the hypothetical wishes, and come up with an even better safeguard when I found myself (to my great irrational surprise) getting another wish.
There is not even necessarily a tiny chance of wanting to kill myself. Good epistemology says to think there is, just in case, but some things are actually impossible. Using wishes to make it impossible for me to want to kill myself might come faster than killing myself.
If I kill myself, there is no nth wish as n → infinity, or a busy beaver function of Graham’s numberth wish, so the first wish is wishing for something undefined.
I think you’re right, though I’m not sure that’s exactly a good thing.
and they could be a convergent infinite series, if the probability of wanting to kill myself went down each time.
I see no particular reason to expect that to be the case.
Using wishes to make it impossible for me to want to kill myself might come faster than killing myself.
Excellent point. That might just work (though I’m sure there are still a thousand ways it could go mind-bogglingly wrong).
If you did eventually wish for death, then you would have judged that death is the best thing you can wish for, after having tried as many alternatives as possible.
Are you going to try to use your prior (I don’t want to die) to argue with your future self who has experienced the results (and determines that you would be happier dying right now than getting any wish)?
I would not want to kill myself if my distant-future self wanted to die or wanted me to die immediately. I think it is much more likely that I would accidentally self-modify in a manner I wouldn’t like if I reflected on it now and that would lead to wishing for death than that my current self with additional knowledge would chose death over omnipotence.
I don’t think the factual question “would I be happier dying right now” would necessarily be the one that decided the policy question of “will I chose do die” for both me and my future self, because we could each care about different things.
And with a warning “The way things are going, you’ll end up wanting to die.” I could change my wishes and maybe get something better.
“Does my current utility function contain a global maximum at the case where I wish for and receive death right now?” is a really scary question to ask a genie.
I would prefer “I wish for the world to changed in such a manner as my present actual utility function, (explicitly distinct from my perception of my utility function), is at the global maximum possible without altering my present actual utility function.”
Or in colloquial language “Give me what I want, not what I think I want.”
That sounds pretty scary too. I don’t think I am close enough to being an agent to have a well-defined utility function. If I do (paradoxical as it sounds), it’s probably not something I would reflectively like. For example, I think I have more empathy for things I am sexually attracted to. But the idea of a world where everyone else (excluding me and a few people I really like) is a second-class citizen to hot babes horrifies me. But with the wrong kind of extrapolation, I bet that could be said to be what I want.
I can’t easily describe any procedure I know I would like for getting a utility function out of me. If I or some simulated copy of me remained to actually be deciding things, I think I could get things I would not only like, but like and like liking. Especially if I can change myself from an insane ape who wishes it was a rationalist, to an actual rationalist through explicitly specified modifications guided by wished-for knowledge.
The best way I can think of to ensure that the extrapolated utility function is something like whatever is used in making my decisions, is to just use the brain circuits I already have that do that the way I like.
I also think a good idea might be to have a crowd of backup copies of me. One of us would try making some self-modifications in a sandboxed universe where their wishes could not get outside, and then the others would vote on whether to keep them.
Well, you don’t prefer a world “where everyone else (excluding me and a few people I really like) is a second-class citizen to hot babes horrifies me.” to the current world. If you can express such a judgement preferring one universe over another, and those judgements are transitive, you have a utility function.
And if you want to like liking them, that is also part of your utility function.
One confounding factor that you do bring up- the domain of one’s utility function really doesn’t need to include things outside the realm of possibility.
Well if deep down I wish that hot babes were given higher status in society than ugly females and males I didn’t know (I’ll roll with this example even though with omnipotence I could probably do things that would make this a nonissue) and I wish that I didn’t wish it, and the genie maximizes my inferred utility function once (not maximizing it once and then maximizing the new one that doing so replaces it with), we would end up with a world where hot babes had higher status and I wished they didn’t. The only thing I can see saving us from this would be if I also wanted my wants to be fulfilled, whatever they were. But then we would end up with me assigning Flimple utility to 2 + 2 being equal to 4.
I understand the argument “If you preferred the world ruled by hot babes it wouldn’t horrify you,” but what if that’s just because when I imagine things at a global scale they turn featureless, and appearances don’t matter to me anymore, but when I see the people in person they still do? What if, being able to see and socialize with every hot babe in the world changed my mind about whether I would want them to have higher social status, even if I was also able to see and socialize with everyone else?
What if the appearance of transitivity is only from drawing analogies from the last scenario I thought about (A) to the next one (B), but if I started a chain of analogies from somewhere else (C) my view on B would be completely different, such that you could take exchange rates of certain things I considered good in each scenario and construct a cycle of preference?
There’s a difference between finding a global maximum of a function and finding the local derivative and moving infinity towards a local maximum.
Even if your actual utility function were for one group to have higher status, that does not imply that the greatest utility comes at the highers imbalance.
And if your preferences aren’t transitive, you don’t have a valid utility function. If you find yourself in a cycle of preference, you have probably failed to accurately judge two or more things which are hard to compare.
The problem isn’t that I might see a hot babe and care about her more than other people, but that when the numbers are bigger I care about everyone the same amount. It’s that it probably dependent on things like whether i have seen them, and how much I know about them. If I was told by a perfectly reliable source that X is a hot babe by my standards, that would not make me care about her more than other people. But it would if I saw her. So what I want is not just dependent on what I believe, but on what I experience. On some figurative level, I’m a Mealy machine, not a Moore machine.
If you find yourself in a cycle of preference, you have probably failed to accurately judge two or more things which are hard to compare.
When I prefer A to B, B to C, and C to A (a cycle of preference), then I typically find that there is something moral in the general sense about A, something moral in the direct sense about B, and something which is personally advantageous about C.
For instance, I would rather my donations to charity have a larger total effect, which in the current world means donating to the best aggregator rather to individual causes. I would rather donate to individual causes than ignore them in selfish self-interest. I would rather spend my money in my own self interest than redistribute it in order to achieve maximum benefits. I believe that the reason I think my preferences lie this way is that I am unable to judge the value of diversified charity compared to selfish behavior.
In short, I am extrapolating from the single data point I have. I am assuming that there is one overwhelmingly likely reason for making that type of error, in which case it is likely that mine is for that reason and that yours is as well.
Because I would wish to kill myself eventually? It’s hard to imagine that I would do that, faced with unlimited wishes. If I got bored, I could just wish the boredom away.
Though on reflection this wish needs a safeguard against infinite recursion, and a bliss-universe for any simulated copies of me the genie creates to determine what my wishes would be.
In a sufficiently bad situation, you may wish for the genie to kill you because you think that’s your only wish. It’s not likely for any given wish, but would happen eventually (and ends the recursion, so that’s one of the few stable wishes).
If I kill myself, there is no nth wish as n → infinity, or a busy beaver function of Graham’s numberth wish, so the first wish is wishing for something undefined.
Also, the probability of any of the individually improbable events where I kill myself happening is bounded above by the some of their probabilities, and they could be a convergent infinite series, if the probability of wanting to kill myself went down each time. Even though I stipulated that it’s if I believed each wish was the last, I might do something like “except don’t grant this wish if it would result in me wanting to kill myself or dying before I could consider the question” in each hypothetical wish. Or grant myself superintelligence as part of one of the hypothetical wishes, and come up with an even better safeguard when I found myself (to my great irrational surprise) getting another wish.
There is not even necessarily a tiny chance of wanting to kill myself. Good epistemology says to think there is, just in case, but some things are actually impossible. Using wishes to make it impossible for me to want to kill myself might come faster than killing myself.
I think you’re right, though I’m not sure that’s exactly a good thing.
I see no particular reason to expect that to be the case.
Excellent point. That might just work (though I’m sure there are still a thousand ways it could go mind-bogglingly wrong).
If you did eventually wish for death, then you would have judged that death is the best thing you can wish for, after having tried as many alternatives as possible.
Are you going to try to use your prior (I don’t want to die) to argue with your future self who has experienced the results (and determines that you would be happier dying right now than getting any wish)?
I would not want to kill myself if my distant-future self wanted to die or wanted me to die immediately. I think it is much more likely that I would accidentally self-modify in a manner I wouldn’t like if I reflected on it now and that would lead to wishing for death than that my current self with additional knowledge would chose death over omnipotence.
I don’t think the factual question “would I be happier dying right now” would necessarily be the one that decided the policy question of “will I chose do die” for both me and my future self, because we could each care about different things.
And with a warning “The way things are going, you’ll end up wanting to die.” I could change my wishes and maybe get something better.
“Does my current utility function contain a global maximum at the case where I wish for and receive death right now?” is a really scary question to ask a genie.
I would prefer “I wish for the world to changed in such a manner as my present actual utility function, (explicitly distinct from my perception of my utility function), is at the global maximum possible without altering my present actual utility function.”
Or in colloquial language “Give me what I want, not what I think I want.”
That sounds pretty scary too. I don’t think I am close enough to being an agent to have a well-defined utility function. If I do (paradoxical as it sounds), it’s probably not something I would reflectively like. For example, I think I have more empathy for things I am sexually attracted to. But the idea of a world where everyone else (excluding me and a few people I really like) is a second-class citizen to hot babes horrifies me. But with the wrong kind of extrapolation, I bet that could be said to be what I want.
I can’t easily describe any procedure I know I would like for getting a utility function out of me. If I or some simulated copy of me remained to actually be deciding things, I think I could get things I would not only like, but like and like liking. Especially if I can change myself from an insane ape who wishes it was a rationalist, to an actual rationalist through explicitly specified modifications guided by wished-for knowledge.
The best way I can think of to ensure that the extrapolated utility function is something like whatever is used in making my decisions, is to just use the brain circuits I already have that do that the way I like.
I also think a good idea might be to have a crowd of backup copies of me. One of us would try making some self-modifications in a sandboxed universe where their wishes could not get outside, and then the others would vote on whether to keep them.
Well, you don’t prefer a world “where everyone else (excluding me and a few people I really like) is a second-class citizen to hot babes horrifies me.” to the current world. If you can express such a judgement preferring one universe over another, and those judgements are transitive, you have a utility function.
And if you want to like liking them, that is also part of your utility function.
One confounding factor that you do bring up- the domain of one’s utility function really doesn’t need to include things outside the realm of possibility.
Well if deep down I wish that hot babes were given higher status in society than ugly females and males I didn’t know (I’ll roll with this example even though with omnipotence I could probably do things that would make this a nonissue) and I wish that I didn’t wish it, and the genie maximizes my inferred utility function once (not maximizing it once and then maximizing the new one that doing so replaces it with), we would end up with a world where hot babes had higher status and I wished they didn’t. The only thing I can see saving us from this would be if I also wanted my wants to be fulfilled, whatever they were. But then we would end up with me assigning Flimple utility to 2 + 2 being equal to 4.
I understand the argument “If you preferred the world ruled by hot babes it wouldn’t horrify you,” but what if that’s just because when I imagine things at a global scale they turn featureless, and appearances don’t matter to me anymore, but when I see the people in person they still do? What if, being able to see and socialize with every hot babe in the world changed my mind about whether I would want them to have higher social status, even if I was also able to see and socialize with everyone else?
What if the appearance of transitivity is only from drawing analogies from the last scenario I thought about (A) to the next one (B), but if I started a chain of analogies from somewhere else (C) my view on B would be completely different, such that you could take exchange rates of certain things I considered good in each scenario and construct a cycle of preference?
There’s a difference between finding a global maximum of a function and finding the local derivative and moving infinity towards a local maximum.
Even if your actual utility function were for one group to have higher status, that does not imply that the greatest utility comes at the highers imbalance.
And if your preferences aren’t transitive, you don’t have a valid utility function. If you find yourself in a cycle of preference, you have probably failed to accurately judge two or more things which are hard to compare.
The problem isn’t that I might see a hot babe and care about her more than other people, but that when the numbers are bigger I care about everyone the same amount. It’s that it probably dependent on things like whether i have seen them, and how much I know about them. If I was told by a perfectly reliable source that X is a hot babe by my standards, that would not make me care about her more than other people. But it would if I saw her. So what I want is not just dependent on what I believe, but on what I experience. On some figurative level, I’m a Mealy machine, not a Moore machine.
Why do you think this?
When I prefer A to B, B to C, and C to A (a cycle of preference), then I typically find that there is something moral in the general sense about A, something moral in the direct sense about B, and something which is personally advantageous about C.
For instance, I would rather my donations to charity have a larger total effect, which in the current world means donating to the best aggregator rather to individual causes. I would rather donate to individual causes than ignore them in selfish self-interest. I would rather spend my money in my own self interest than redistribute it in order to achieve maximum benefits. I believe that the reason I think my preferences lie this way is that I am unable to judge the value of diversified charity compared to selfish behavior.
In short, I am extrapolating from the single data point I have. I am assuming that there is one overwhelmingly likely reason for making that type of error, in which case it is likely that mine is for that reason and that yours is as well.