Suppose the state of the world as a function of time is X(t), and the value of a state of the world is V(x). The orthodox way to aggregate the value of a future trajectory is exponential discounting, i.e., U(X)=∫∞0γt⋅V(X(t))dt, where 0<γ<1 is called the discount factor. Now, in your example problem, let us take V(status quo):=0, V(1 life saved):=1, and V(7 billion tortured)=−(7×109)ε, where I’ll let you choose ε, but it has to be strictly greater than zero. (Assuming you don’t think the torture itself is neutral, your value function should have this form, up to additive and multiplicative constants.)
In the status quo,
U(status quo)=∫∞0γt⋅V(status quo)dt=0
Pressing the button gives
U(press button)=∫10110γt⋅V(7 billion tortured)dt+∫∞1011γt⋅V(1 life saved)dt=1−logγ[(1−γ1011)V(7 billion tortured)+γ1011V(1 life saved)]=1−logγ[−(1−γ1011)(7×109)ε+γ1011]
If we solve the inequality for this to be strictly positive, we get
γ>((7×109)ε1+(7×109)ε)10−11
Supposing ε is, I don’t know, 0.001 for example, then γ has to be greater than 0.999999999999999998. Well, you might say, isn’t that still less than 1? Sure, but:
γ should be bounded above not just by 1, but by 1−P(x-risk per year). In this case, your intuition for γ seems to be pushing the probability of extinction per year down to at most 10−17. There’s an argument to be made that aligned AGI would get the extinction risk of Earth-originating life down to something like that kind of level, but it’s not trivial (extraterrestrial threats start to get plausible), and you don’t seem to be making it.
the large numbers involved here, and their arbitrary nature, suggest that your intuition is guiding you along the lines of γ=1, and there are all sorts of inconsistency paradoxes from that, because the expected value of a lot of things ends up being ∞−∞, which is really bad, even measure theory can’t save you from that kind of divergence. I’m as long-termist as the next guy, but I think we have to set our discount factors a little lower than 0.999999999999999.
I’ll let you choose ε, but it has to be strictly greater than zero.
The original post is equivalent to choosing epsilon = 0. You could just say “I don’t accept this”, instead of going through a bunch of math based on a premise that you can deduce will be rejected.
I agree that one would reach the same conclusion to press the button with ε=0, but I’d be quite surprised if RomanS would actually choose ε=0. Otherwise, he would have to consider it absolutely ethically neutral to torture, even if it didn’t save any lives or provide any benefits at all—and that’s at least one qualitative step more outrageous than what he’s actually saying.
Instead, I think RomanS believes γ=1, that the distant and infinite future absolutely overwhelms any finite span of time, and that’s why he places so much emphasis on reversibility.
My apologies, my statement that it was equivalent to epsilon=0 was incorrect.
The description given is that of lexicographic preferences, which in this case cannot be represented with real-valued utility functions at all. There are consistent ways to deal with such preferences, by they do do tend have unusual properties.
Such as, for example, preferring that everyone in the universe is tortured forever rather than accepting 0.00000000000001 extra probability that a single person somewhere might die.
I suspect one problem is that really “death” depends crucially upon “personal identity”, and it’s a fuzzy enough concept at the extreme boundaries that lexicographic preferences over it make no sense.
Suppose the state of the world as a function of time is X(t), and the value of a state of the world is V(x). The orthodox way to aggregate the value of a future trajectory is exponential discounting, i.e., U(X)=∫∞0γt⋅V(X(t))dt, where 0<γ<1 is called the discount factor. Now, in your example problem, let us take V(status quo):=0, V(1 life saved):=1, and V(7 billion tortured)=−(7×109)ε, where I’ll let you choose ε, but it has to be strictly greater than zero. (Assuming you don’t think the torture itself is neutral, your value function should have this form, up to additive and multiplicative constants.)
In the status quo,
U(status quo)=∫∞0γt⋅V(status quo)dt=0Pressing the button gives
U(press button)=∫10110γt⋅V(7 billion tortured)dt+∫∞1011γt⋅V(1 life saved)dt=1−logγ[(1−γ1011)V(7 billion tortured)+γ1011V(1 life saved)]=1−logγ[−(1−γ1011)(7×109)ε+γ1011]If we solve the inequality for this to be strictly positive, we get
γ>((7×109)ε1+(7×109)ε)10−11Supposing ε is, I don’t know, 0.001 for example, then γ has to be greater than 0.999999999999999998. Well, you might say, isn’t that still less than 1? Sure, but:
γ should be bounded above not just by 1, but by 1−P(x-risk per year). In this case, your intuition for γ seems to be pushing the probability of extinction per year down to at most 10−17. There’s an argument to be made that aligned AGI would get the extinction risk of Earth-originating life down to something like that kind of level, but it’s not trivial (extraterrestrial threats start to get plausible), and you don’t seem to be making it.
the large numbers involved here, and their arbitrary nature, suggest that your intuition is guiding you along the lines of γ=1, and there are all sorts of inconsistency paradoxes from that, because the expected value of a lot of things ends up being ∞−∞, which is really bad, even measure theory can’t save you from that kind of divergence. I’m as long-termist as the next guy, but I think we have to set our discount factors a little lower than 0.999999999999999.
The original post is equivalent to choosing epsilon = 0. You could just say “I don’t accept this”, instead of going through a bunch of math based on a premise that you can deduce will be rejected.
I agree that one would reach the same conclusion to press the button with ε=0, but I’d be quite surprised if RomanS would actually choose ε=0. Otherwise, he would have to consider it absolutely ethically neutral to torture, even if it didn’t save any lives or provide any benefits at all—and that’s at least one qualitative step more outrageous than what he’s actually saying.
Instead, I think RomanS believes γ=1, that the distant and infinite future absolutely overwhelms any finite span of time, and that’s why he places so much emphasis on reversibility.
My apologies, my statement that it was equivalent to epsilon=0 was incorrect.
The description given is that of lexicographic preferences, which in this case cannot be represented with real-valued utility functions at all. There are consistent ways to deal with such preferences, by they do do tend have unusual properties.
Such as, for example, preferring that everyone in the universe is tortured forever rather than accepting 0.00000000000001 extra probability that a single person somewhere might die.
I suspect one problem is that really “death” depends crucially upon “personal identity”, and it’s a fuzzy enough concept at the extreme boundaries that lexicographic preferences over it make no sense.