An unbounded utility function does not literally make you “Pascal’s muggable”; there are much better ways to seek infinite utility than to pay a mugger.
But it’s even worse than that, since an unbounded utility function will make the utility of every realistic outcome undefined; an unbounded real-valued utility function doesn’t represent any complete set of preferences over infinite lotteries. So I agree that someone who describes their preferences with an unbounded utility function needs to clarify what they actually want.
But if people “genuinely believe in” (i.e. “actually have”) unbounded utility functions, they would be horrified by the prospect of passing up massive gains just because those gains are improbable. That’s precisely what it means to have an unbounded utility function. I don’t understand what it means to “actually have” an unbounded utility function but to be happy with that proposed resolution.
So it seems like those people need to think about what their preferences are over infinitely valuable distributions over outcomes; once they have an answer, so that they can actually determine what is recommended by their purported preferences, then they can think about whether they like those recommendations.
What do you mean “they would want to change their values if they could”? That sounds nonsensical. I could round it to “they would find that their values differ from the simple model of their values they had previously put forward.” But once I’ve made that substitution, it’s not really clear how this is relevant to the discussion, except as an argument against taking extreme and irreversible actions on the basis of a simple model of your values that looks appealing at the moment.
An unbounded utility function does not literally make you “Pascal’s muggable”; there are much better ways to seek infinite utility than to pay a mugger.
Have you solved that problem, then? Most people I’ve talked don’t seem to believe it’s solved.
except as an argument against taking extreme and irreversible actions on the basis of a simple model of your values that looks appealing at the moment.
The approach I presented is designed so that you can get as close to your simple model, while reducing the risks of doing so.
Have you solved that problem, then? Most people I’ve talked don’t seem to believe it’s solved.
You aren’t supposed to literally pay the mugger; it’s an analogy. Either:
(1) you do something more promising to capture hypothetical massive utility (e.g. this happens if we have a plausible world model and place a finite but massive upper bound on utilities), or
(2) you are unable to make a decision because all payoffs are infinite.
? I don’t see why the world needs to be sufficiently convenient to allow (1). And the problem resurfaces with huge-but-bounded utilities, so invoking (2) is not enough.
You cited avoiding the “immense potential damage of being known to be Pascal muggable” as a motivating factor for actual humans, suggesting that you were talking about the real world. There might be some damage from being “muggable,” but it’s not clear why being known to be muggable is a disadvantage, given that here in the real world we don’t pay the mugger regardless of our philosophical views.
I agree that you can change the thought experiment to rule out (1). But if you do, it loses all of its intuitive force. Think about it from the perspective of someone in the modified thought experiment:
You are 100% sure there is no other way to get as much utility as the mugger promises at any other time in the future of the universe. But somehow you aren’t so sure about the mugger’s offer. So this is literally the only possible chance in all of history to get an outcome this good, or even anywhere close. Do you pay then?
“Yes” seems like a plausible answer (even before the mugger opens her mouth). The real question is how you came to have such a bizarre state of knowledge about the world, not why you are taking the mugger seriously once you do!
but it’s not clear why being known to be muggable is a disadvantage, given that here in the real world we don’t pay the mugger regardless of our philosophical views.
Being known to be muggable attracts people to give it a try. But if we don’t pay the mugger in reality, then we can’t be known to be muggable, because we aren’t.
You are 100% sure there is no other way to get as much utility as the mugger promises at any other time in the future of the universe.
It doesn’t seem unreasonable to get quasi 100% if the amount the mugger promises is sufficiently high (“all the matter in all the reachable universe dedicated to building a single AI to define the highest number possible—that’s how much utility I promise you”).
An unbounded utility function does not literally make you “Pascal’s muggable”; there are much better ways to seek infinite utility than to pay a mugger.
But it’s even worse than that, since an unbounded utility function will make the utility of every realistic outcome undefined; an unbounded real-valued utility function doesn’t represent any complete set of preferences over infinite lotteries. So I agree that someone who describes their preferences with an unbounded utility function needs to clarify what they actually want.
But if people “genuinely believe in” (i.e. “actually have”) unbounded utility functions, they would be horrified by the prospect of passing up massive gains just because those gains are improbable. That’s precisely what it means to have an unbounded utility function. I don’t understand what it means to “actually have” an unbounded utility function but to be happy with that proposed resolution.
So it seems like those people need to think about what their preferences are over infinitely valuable distributions over outcomes; once they have an answer, so that they can actually determine what is recommended by their purported preferences, then they can think about whether they like those recommendations.
What do you mean “they would want to change their values if they could”? That sounds nonsensical. I could round it to “they would find that their values differ from the simple model of their values they had previously put forward.” But once I’ve made that substitution, it’s not really clear how this is relevant to the discussion, except as an argument against taking extreme and irreversible actions on the basis of a simple model of your values that looks appealing at the moment.
Have you solved that problem, then? Most people I’ve talked don’t seem to believe it’s solved.
The approach I presented is designed so that you can get as close to your simple model, while reducing the risks of doing so.
You aren’t supposed to literally pay the mugger; it’s an analogy. Either:
(1) you do something more promising to capture hypothetical massive utility (e.g. this happens if we have a plausible world model and place a finite but massive upper bound on utilities), or (2) you are unable to make a decision because all payoffs are infinite.
? I don’t see why the world needs to be sufficiently convenient to allow (1). And the problem resurfaces with huge-but-bounded utilities, so invoking (2) is not enough.
You cited avoiding the “immense potential damage of being known to be Pascal muggable” as a motivating factor for actual humans, suggesting that you were talking about the real world. There might be some damage from being “muggable,” but it’s not clear why being known to be muggable is a disadvantage, given that here in the real world we don’t pay the mugger regardless of our philosophical views.
I agree that you can change the thought experiment to rule out (1). But if you do, it loses all of its intuitive force. Think about it from the perspective of someone in the modified thought experiment:
You are 100% sure there is no other way to get as much utility as the mugger promises at any other time in the future of the universe. But somehow you aren’t so sure about the mugger’s offer. So this is literally the only possible chance in all of history to get an outcome this good, or even anywhere close. Do you pay then?
“Yes” seems like a plausible answer (even before the mugger opens her mouth). The real question is how you came to have such a bizarre state of knowledge about the world, not why you are taking the mugger seriously once you do!
Being known to be muggable attracts people to give it a try. But if we don’t pay the mugger in reality, then we can’t be known to be muggable, because we aren’t.
It doesn’t seem unreasonable to get quasi 100% if the amount the mugger promises is sufficiently high (“all the matter in all the reachable universe dedicated to building a single AI to define the highest number possible—that’s how much utility I promise you”).