Epistemic status: rough ethical and empirical heuristic.
Assuming that value is roughly linear in resources available after we reach technological maturity,[1] my probability distribution of value is so bimodal that it is nearly binary. In particular, I assign substantial probability to near-optimal futures (at least 99% of the value of the optimal future), substantial probability to near-zero-value futures (between −1% and 1% of the value of the optimal future), and little probability to anything else.[2] To the extent that almost all of the probability mass fits into two buckets, and everything within a bucket is almost equally valuable as everything else in that bucket, the goal maximize expected value reduces to the goal maximize probability of the better bucket.
So rather than thinking about how to maximize expected value, I generally think about maximizing the probability of a great (i.e., near-optimal) future. This goal is easier for me to think about, particularly since I believe that the paths to a great future are rather homogeneous — alike not just in value but in high-level structure. In the rest of this shortform, I explain my belief that the future is likely to be near-optimal or near-zero.
Substantial probability to near-optimal futures.
I have substantial credence that the future is at least 99% as good as the optimal future.[3] I do not claim much certainty about what the optimal future looks like — my baseline assumption is that it involves increasing and improving consciousness in the universe, but I have little idea whether that would look like many very small minds or a few very big minds. Or perhaps the optimal future involves astronomical-scale acausal trade. Or perhaps future advances in ethics, decision theory, or physics will have unforeseeable implications for how a technologically mature civilization can do good.
But uniting almost all of my probability mass for near-optimal futures is how we get there, at a high level: we create superintelligence, achieve technological maturity, solve ethics, and then optimize. Without knowing what this looks like in detail, I assign substantial probability to the proposition that humanity successfully completes this process. And I think almost all futures in which we do complete this process look very similar: they have nearly identical technology, reach the same conclusions on ethics, have nearly identical resources available to them (mostly depending on how long it took them to reach maturity), and so produce nearly identical value.
Almost all of the remaining probability to near-zero futures.
This claim is bolder, I think. Even if it seems reasonable to expect a substantial fraction of possible futures to converge to near-optimal, it may seem odd to expect almost all of the rest to be near-zero. But I find it difficult to imagine any other futures.
For a future to not be near-zero, it must involve using a nontrivial fraction of the resources available in the optimal future (by my assumption that value is roughly linear in resources). More significantly, the future must involve using resources at a nontrivial fraction of the efficiency of their use in the optimal future. This seems unlikely to happen by accident. In particular, I claim:
If a future does not involve optimizing for the good, value is almost certainly near-zero.
Roughly, this holds if all (nontrivially efficient) ways of promoting the good are not efficient ways of optimizing for anything else that we might optimize for. I strongly intuit that this is true; I expect that as technology improves, efficiently producing a unit of something will produce very little of almost all other things (where “thing” includes not just stuff but also minds, qualia, etc.).[4] If so, then value (or disvalue) is (in expectation) a negligible side effect of optimization for other things. And I cannot reasonably imagine a future optimized for disvalue, so I think almost all non-near-optimal futures are near-zero.
So I believe that either we optimize for value and get a near-optimal future, or we do anything else and get a near-zero future.
Intuitively, it seems possible to optimize for more than one value. I think such scenarios are unlikely. Even if our utility function has multiple linear terms, unless there is some surprisingly good way to achieve them simultaneously, we optimize by pursuing one of them near-exclusively.[5] Optimizing a utility function that looks more like min(x,y) may be a plausible result of a grand bargain, but such a scenario requires that, after we have mature technology, multiple agents have nontrivial bargaining power and different values. I find this unlikely; I expect singleton-like scenarios and that powerful agents will either all converge to the same preferences or all have near-zero-value preferences.
I mostly see “value is binary” as a heuristic for reframing problems. It also has implications for what we should do: to the extent that value is binary (and to the extent that doing so is feasible), we should focus on increasing the probability of great futures. If a “catastrophic” future is one in which we realize no more than a small fraction of our value, then a great future is simply one which is not catastrophic and we should focus on avoiding catastrophes. But of course, “value is binary” is an empirical approximation rather than an a priori truth. Even if value seems very nearly binary, we should not reject contrary proposed interventions[6] or possible futures out of hand.
I would appreciate suggestions on how to make these ideas more formal or precise (in addition to comments on what I got wrong or left out, of course). Also, this shortform relies on argument by “I struggle to imagine”; if you can imagine something I cannot, please explain your scenario and I will justify my skepticism or update.
You would reject this if you believed that astronomical-scale goods are not astronomically better than Earth-scale goods or if you believed that some plausible Earth-scale bad would be worse than astronomical-scale goods are good.
“Optimal” value is roughly defined as the expected value of the future in which we act as well as possible, from our current limited knowledge about what “acting well” looks like. “Zero” is roughly defined as any future in which we fail to do anything astronomically significant. I consider value relative to the optimal future, ignoring uncertainty about how good the optimal future is — we should theoretically act as if we’re in a universe with high variance in value between different possibilities, but I don’t see how this affects what we should choose before reaching technological maturity.*
*Except roughly that we should act with unrealistically low probability that we are in a kind of simulation in which our choices matter very little or have very differently-valued consequences than otherwise. The prospect of such simulations might undermine my conclusions—value might still be binary, but for the wrong reason—so it is useful to be able to almost-ignore such possibilities.
If we particularly believe that value is fragile, we have an additional reason to expect this orthogonality. But I claim that different goals tend to be orthogonal at high levels of technology independent of value’s fragility.
That is, those that affect the probability of futures outside the binary or that affect how good the future is within the set of near-zero (or near-optimal) futures out of hand.
After reading the first paragraph of your above comment only, I want to note that:
In particular, I assign substantial probability to near-optimal futures (at least 99% of the value of the optimal future), substantial probability to near-zero-value futures (between −1% and 1% of the value of the optimal future), and little probability to anything else.
I assign much lower probability to near-optimal futures than near-zero-value futures.
This is mainly because I imagine a lot of the “extremely good” possible worlds I imagine when reading Bostrom’s Letter from Utopia are <1% of what is optimal.
I also think the amount of probability I assign to 1%-99% futures is (~10x?) larger than the amount I assign to >99% futures.
(I’d like to read the rest of your comment later (but not right now due to time constraints) to see if it changes my view.)
I agree that near-optimal is unlikely. But I would be quite surprised by 1%-99% futures because (in short) I think we do better if we optimize for good and do worse if we don’t. If our final use of our cosmic endowment isn’t near-optimal, I think we failed to optimize for good and would be surprised if it’s >1%.
Related idea, off the cuff, rough. Not really important or interesting, but might lead to interesting insights. Mostly intended for my future selves, but comments are welcome.
Binaries Are Analytically Valuable
Suppose our probability distribution for alignment success is nearly binary. In particular, suppose that we have high credence that, by the time we can create an AI capable of triggering an intelligence explosion, we will have
really solved alignment (i.e., we can create an aligned AI capable of triggering an intelligence explosion at reasonable extra cost and delay) or
really not solved alignment (i.e., we cannot create a similarly powerful aligned AI, or doing so would require very unreasonable extra cost and delay)
(Whether this is actually true is irrelevant to my point.)
Why would this matter?
Stating the risk from an unaligned intelligence explosion is kind of awkward: it’s that the alignment tax is greater than what the leading AI project is able/willing to pay. Equivalently, our goal is for the alignment tax to be less than what the leading AI project is able/willing to pay. This gives rise to two nice, clean desiderata:
Decrease the alignment tax
Increase what the leading AI project is able/willing to pay for alignment
But unfortunately, we can’t similarly split the goal (or risk) into two goals (or risks). For example, a breakdown into the following two goals does not capture the risk from an unaligned intelligence explosion:
Make the alignment tax less than 6 months and a trillion dollars
Make the leading AI project able/willing to spend 6 months and a trillion dollars on aligning an AI
It would suffice to achieve both of these goals, but doing so is not necessary. If we fail to reduce the alignment tax this far, we can compensate by doing better on the willingness-to-pay front, and vice versa.
But if alignment success is binary, then we actually can decompose the goal (bolded above) into two necessary (and jointly sufficient) conditions:
Really solve alignment; i.e., reduce the alignment tax to [reasonable value]
Make the leading AI project able/willing to spend [reasonable value] on alignment
(Where [reasonable value] depends on what exactly our binary-ish probability distribution for alignment success looks like.)
Breaking big goals down into smaller goals—in particular, into smaller necessary conditions—is valuable, analytically and pragmatically. Binaries help, when they exist. Sometimes weaker conditions on the probability distribution, those of the form a certain important subset of possibilities has very low probability, can be useful in the same way.
Value Is Binary
Epistemic status: rough ethical and empirical heuristic.
Assuming that value is roughly linear in resources available after we reach technological maturity,[1] my probability distribution of value is so bimodal that it is nearly binary. In particular, I assign substantial probability to near-optimal futures (at least 99% of the value of the optimal future), substantial probability to near-zero-value futures (between −1% and 1% of the value of the optimal future), and little probability to anything else.[2] To the extent that almost all of the probability mass fits into two buckets, and everything within a bucket is almost equally valuable as everything else in that bucket, the goal maximize expected value reduces to the goal maximize probability of the better bucket.
So rather than thinking about how to maximize expected value, I generally think about maximizing the probability of a great (i.e., near-optimal) future. This goal is easier for me to think about, particularly since I believe that the paths to a great future are rather homogeneous — alike not just in value but in high-level structure. In the rest of this shortform, I explain my belief that the future is likely to be near-optimal or near-zero.
Substantial probability to near-optimal futures.
I have substantial credence that the future is at least 99% as good as the optimal future.[3] I do not claim much certainty about what the optimal future looks like — my baseline assumption is that it involves increasing and improving consciousness in the universe, but I have little idea whether that would look like many very small minds or a few very big minds. Or perhaps the optimal future involves astronomical-scale acausal trade. Or perhaps future advances in ethics, decision theory, or physics will have unforeseeable implications for how a technologically mature civilization can do good.
But uniting almost all of my probability mass for near-optimal futures is how we get there, at a high level: we create superintelligence, achieve technological maturity, solve ethics, and then optimize. Without knowing what this looks like in detail, I assign substantial probability to the proposition that humanity successfully completes this process. And I think almost all futures in which we do complete this process look very similar: they have nearly identical technology, reach the same conclusions on ethics, have nearly identical resources available to them (mostly depending on how long it took them to reach maturity), and so produce nearly identical value.
Almost all of the remaining probability to near-zero futures.
This claim is bolder, I think. Even if it seems reasonable to expect a substantial fraction of possible futures to converge to near-optimal, it may seem odd to expect almost all of the rest to be near-zero. But I find it difficult to imagine any other futures.
For a future to not be near-zero, it must involve using a nontrivial fraction of the resources available in the optimal future (by my assumption that value is roughly linear in resources). More significantly, the future must involve using resources at a nontrivial fraction of the efficiency of their use in the optimal future. This seems unlikely to happen by accident. In particular, I claim:
If a future does not involve optimizing for the good, value is almost certainly near-zero.
Roughly, this holds if all (nontrivially efficient) ways of promoting the good are not efficient ways of optimizing for anything else that we might optimize for. I strongly intuit that this is true; I expect that as technology improves, efficiently producing a unit of something will produce very little of almost all other things (where “thing” includes not just stuff but also minds, qualia, etc.).[4] If so, then value (or disvalue) is (in expectation) a negligible side effect of optimization for other things. And I cannot reasonably imagine a future optimized for disvalue, so I think almost all non-near-optimal futures are near-zero.
So I believe that either we optimize for value and get a near-optimal future, or we do anything else and get a near-zero future.
Intuitively, it seems possible to optimize for more than one value. I think such scenarios are unlikely. Even if our utility function has multiple linear terms, unless there is some surprisingly good way to achieve them simultaneously, we optimize by pursuing one of them near-exclusively.[5] Optimizing a utility function that looks more like min(x,y) may be a plausible result of a grand bargain, but such a scenario requires that, after we have mature technology, multiple agents have nontrivial bargaining power and different values. I find this unlikely; I expect singleton-like scenarios and that powerful agents will either all converge to the same preferences or all have near-zero-value preferences.
I mostly see “value is binary” as a heuristic for reframing problems. It also has implications for what we should do: to the extent that value is binary (and to the extent that doing so is feasible), we should focus on increasing the probability of great futures. If a “catastrophic” future is one in which we realize no more than a small fraction of our value, then a great future is simply one which is not catastrophic and we should focus on avoiding catastrophes. But of course, “value is binary” is an empirical approximation rather than an a priori truth. Even if value seems very nearly binary, we should not reject contrary proposed interventions[6] or possible futures out of hand.
I would appreciate suggestions on how to make these ideas more formal or precise (in addition to comments on what I got wrong or left out, of course). Also, this shortform relies on argument by “I struggle to imagine”; if you can imagine something I cannot, please explain your scenario and I will justify my skepticism or update.
You would reject this if you believed that astronomical-scale goods are not astronomically better than Earth-scale goods or if you believed that some plausible Earth-scale bad would be worse than astronomical-scale goods are good.
“Optimal” value is roughly defined as the expected value of the future in which we act as well as possible, from our current limited knowledge about what “acting well” looks like. “Zero” is roughly defined as any future in which we fail to do anything astronomically significant. I consider value relative to the optimal future, ignoring uncertainty about how good the optimal future is — we should theoretically act as if we’re in a universe with high variance in value between different possibilities, but I don’t see how this affects what we should choose before reaching technological maturity.*
*Except roughly that we should act with unrealistically low probability that we are in a kind of simulation in which our choices matter very little or have very differently-valued consequences than otherwise. The prospect of such simulations might undermine my conclusions—value might still be binary, but for the wrong reason—so it is useful to be able to almost-ignore such possibilities.
That is, at least 99% of the way from the zero-value future to the optimal future.
If we particularly believe that value is fragile, we have an additional reason to expect this orthogonality. But I claim that different goals tend to be orthogonal at high levels of technology independent of value’s fragility.
This assumes that all goods are substitutes in production, which I expect to be nearly true with mature technology.
That is, those that affect the probability of futures outside the binary or that affect how good the future is within the set of near-zero (or near-optimal) futures out of hand.
After reading the first paragraph of your above comment only, I want to note that:
I assign much lower probability to near-optimal futures than near-zero-value futures.
This is mainly because I imagine a lot of the “extremely good” possible worlds I imagine when reading Bostrom’s Letter from Utopia are <1% of what is optimal.
I also think the amount of probability I assign to 1%-99% futures is (~10x?) larger than the amount I assign to >99% futures.
(I’d like to read the rest of your comment later (but not right now due to time constraints) to see if it changes my view.)
I agree that near-optimal is unlikely. But I would be quite surprised by 1%-99% futures because (in short) I think we do better if we optimize for good and do worse if we don’t. If our final use of our cosmic endowment isn’t near-optimal, I think we failed to optimize for good and would be surprised if it’s >1%.
Agreed with this given how many orders of magnitude potential values span.
Rescinding my previous statement:
> I also think the amount of probability I assign to 1%-99% futures is (~10x?) larger than the amount I assign to >99% futures.
I’d now say that probably the probability of 1%-99% optimal futures is <10% of the probability of >99% optimal futures.
This is because 1% optimal is very close to being optimal (only 2 orders of magnitude away out of dozens of orders of magnitude of very good futures).
Related idea, off the cuff, rough. Not really important or interesting, but might lead to interesting insights. Mostly intended for my future selves, but comments are welcome.
Binaries Are Analytically Valuable
Suppose our probability distribution for alignment success is nearly binary. In particular, suppose that we have high credence that, by the time we can create an AI capable of triggering an intelligence explosion, we will have
really solved alignment (i.e., we can create an aligned AI capable of triggering an intelligence explosion at reasonable extra cost and delay) or
really not solved alignment (i.e., we cannot create a similarly powerful aligned AI, or doing so would require very unreasonable extra cost and delay)
(Whether this is actually true is irrelevant to my point.)
Why would this matter?
Stating the risk from an unaligned intelligence explosion is kind of awkward: it’s that the alignment tax is greater than what the leading AI project is able/willing to pay. Equivalently, our goal is for the alignment tax to be less than what the leading AI project is able/willing to pay. This gives rise to two nice, clean desiderata:
Decrease the alignment tax
Increase what the leading AI project is able/willing to pay for alignment
But unfortunately, we can’t similarly split the goal (or risk) into two goals (or risks). For example, a breakdown into the following two goals does not capture the risk from an unaligned intelligence explosion:
Make the alignment tax less than 6 months and a trillion dollars
Make the leading AI project able/willing to spend 6 months and a trillion dollars on aligning an AI
It would suffice to achieve both of these goals, but doing so is not necessary. If we fail to reduce the alignment tax this far, we can compensate by doing better on the willingness-to-pay front, and vice versa.
But if alignment success is binary, then we actually can decompose the goal (bolded above) into two necessary (and jointly sufficient) conditions:
Really solve alignment; i.e., reduce the alignment tax to [reasonable value]
Make the leading AI project able/willing to spend [reasonable value] on alignment
(Where [reasonable value] depends on what exactly our binary-ish probability distribution for alignment success looks like.)
Breaking big goals down into smaller goals—in particular, into smaller necessary conditions—is valuable, analytically and pragmatically. Binaries help, when they exist. Sometimes weaker conditions on the probability distribution, those of the form a certain important subset of possibilities has very low probability, can be useful in the same way.