In 1948, Von Neumann became a consultant for the RAND Corporation. RAND (Research ANd Development) was founded by defense contractors and the Air Force as a “think tank” to “think about the unthinkable.” Their main focus was exploring the possibilities of nuclear war and the possible strategies for such a possibility.
Von Neumann was, at the time, a strong supporter of “preventive war.” Confident even during World War II that the Russian spy network had obtained many of the details of the atom bomb design, Von Neumann knew that it was only a matter of time before the Soviet Union became a nuclear power. He predicted that were Russia allowed to build a nuclear arsenal, a war against the U.S. would be inevitable. He therefore recommended that the U.S. launch a nuclear strike at Moscow, destroying its enemy and becoming a dominant world power, so as to avoid a more destructive nuclear war later on. “With the Russians it is not a question of whether but of when,” he would say. An oft-quoted remark of his is, “If you say why not bomb them tomorrow, I say why not today? If you say today at 5 o’clock, I say why not one o’clock?”
Just a few years after “preventive war” was first advocated, it became an impossibility. By 1953, the Soviets had 300-400 warheads, meaning that any nuclear strike would be effectively retaliated.
I guess the U.S. didn’t launch a first strike because it would have been politically unacceptable to kill millions of people in a situation that couldn’t be viewed as self defense. Tangentially, this seems relevant to a long-running disagreement between us, about how bad is it if AI can’t help us solve moral/philosophical problems, but only acquire resources and keep us in control. What counts as a decisive strategic advantage depends on one’s values and philosophical outlook in general, and this an instance of moral/philosophical confusion potentially being very costly, if the right thing to do from the perspective of the “real values” (e.g., CEV) of the Americans was to do (or threaten) a first strike, in order to either take over more of the universe for themselves or to prevent greater existential risk in the long run.
I agree that “failure to sort out philosophy/values early” has some costs, and this is a reasonable example. The question is: what fraction of the value of the future is sacrificed each subjective year?
Off the top of my head my guess is something like 1-2% per doubling. It sounds like your number is much larger.
(It’s a little bit hard to say exactly what counts. I’m talking about something like “value destroyed due to deficiencies in state of the art understanding” rather than “value destroyed due to all philosophical errors by everyone,” and so am not counting e.g. the costs from a “better dead then red” mentality.)
I do agree that measured in wall-clock time this is going to become a problem fast. So if AI accelerates problems more than philosophy/values, then we pay an additional cost that depends on the difference between (cumulative additional philosophy/values challenges introduced from AI) - (cumulative additional philosophy/values progress due to AI). I’d eyeball that number at ~1 doubling by default, so see this cost as a further 1-2% of the value of the future.
All of this stands against a 10-20% loss due to AI risk proper, and a 0.1-1% risk of extinction from non-AI technologies in the marginal pre-AGI years. So that’s where I’m coming from when I’m not super concerned about this problem.
(These numbers are very made up, my intention is to give a very rough sense of my intuitive model and quantitative intuitions. I could easily imagine that merely thinking about the overall structure of the problem would change my view, not to mention actually getting into details or empirical data.)
How big of a loss do you think the US sustained by not following von Neumann’s suggestion to pursue a decisive strategic advantage? (Or if von Neumann’s advice was actually wrong according to the American’s “real values”, how bad would it have been to follow it?)
What do you think is the state of the art understanding in how one should divide resources between saving/investing, personal/family consumption, and altruistic causes? How big of a loss from what’s “actually right” do you think that represents? (Would it be wrong for someone with substantial moral uncertainty to estimate that loss to be >10%?)
(It’s a little bit hard to say exactly what counts. I’m talking about something like “value destroyed due to deficiencies in state of the art understanding” rather than “value destroyed due to all philosophical errors by everyone,” and so am not counting e.g. the costs from a “better dead then red” mentality.)
But one of my concerns is that AI will exacerbate the problem that the vast majority of people do not have a state of the art understanding of philosophy, for example by causing a lot of damage based on their incorrect understandings, or freezing or extrapolating from a base of incorrect understandings, or otherwise preempting a future where most people eventually fix their philosophical errors. At the same time AI is an opportunity to help solve this problem, if AI designers are foresighted enough (and can coordinate with each other to avoid races, etc.). So I don’t understand why you deliberately exclude this.
Is it because you think AI is not a good opportunity to solve this problem? Can you give your sense of how big this problem is anyway?
So if AI accelerates problems more than philosophy/values, then we pay an additional cost that depends on the difference between (cumulative additional philosophy/values challenges introduced from AI) - (cumulative additional philosophy/values progress due to AI). I’d eyeball that number at ~1 doubling by default, so see this cost as a further 1-2% of the value of the future.
I’m not sure that’s the relevant number to look at. Suppose AI doesn’t accelerate problems more than philosophy/values, we’d still want AIs that can accelerate philosophy/values even more, to reduce the “normal” losses associated with doublings, which would be substantial even at 1% per doubling if added up over tens of doublings.
I agree that people failing to act on the basis of state-of-the-art understanding is a potentially large problem, and that it would be good to use AI as an opportunity to address that problem, I didn’t include it just because it seems like a separate thing (in this context I don’t see why philosophy should be distinguished from people acting on bad empirical views). I don’t have a strong view on this.
I agree that AI could fix the philosophy gap. But in terms of urgency of the problem, if AI accelerates everything equally, it still seems right to look at the cost per subjective unit of time.
10% doesn’t sound like a plausible estimate for the value destroyed from philosophy giving worse answers on saving/consumption/altruism. There are lots of inputs other than philosophical understanding into that question (empirical facts, internal bargaining, external bargaining, etc.), and that problem is itself one of a very large number of determinants of how well the future goes. If you are losing 10% EV on stuff like this, it seems like you are going to lose all the value with pretty high probability, so the less brittle parts of your mixture should dominate.
I didn’t include it just because it seems like a separate thing
I see. I view them as related because both can potentially have the same solution, namely a solution to meta-philosophy that lets AIs make philosophical progress and convinces people to trust the philosophy done by AIs. I suppose you could try to solve the people-acting-on-bad-philosophical-views problem separately, by convincing them to adopt better views, but it seems hard to change people’s minds this way on a large scale.
There are lots of inputs other than philosophical understanding into that question (empirical facts, internal bargaining, external bargaining, etc.)
I can easily imagine making >10% changes in my allocation based on changes in my philosophical understanding alone, so I don’t see why it matters that there are also other inputs.
that problem is itself one of a very large number of determinants of how well the future goes
On one plausible view, a crucial determinant of how well the future goes is how much of the universe is controlled by AIs/people who end up turning their piece of the universe over to the highest-value uses, which in turn is largely determined by how much they save/invest compared to AIs/people who end up with wrong values. That seems enough to show that 10% loss due to this input is plausible, regardless of other determinants.
so the less brittle parts of your mixture should dominate.
Would it be wrong for someone with substantial moral uncertainty to estimate that loss to be >10%?
I can easily imagine making >10% changes in my allocation
Is your argument that 10% is the expected loss, or that it’s plausible that you’d lose 10%?
I understood Paul to be arguing against 10% being the expected loss, in which case potentially making >10% changes in allocation doesn’t seem like a strong counterargument.
Is your argument that 10% is the expected loss, or that it’s plausible that you’d lose 10%?
I think >10% expected loss can probably be argued for, but giving a strong argument would involve going into the details of my state of moral/philosophical/empirical uncertainties and my resource allocations, and then considering various ways my uncertainties could be resolved (various possible combinations of philosophical and empirical outcomes), my expected loss in each scenario, and then averaging the losses. This is a lot of work, I’m a bit reluctant for privacy/signaling reasons, plus I don’t know if Paul would consider my understandings in this area to be state of the art (he didn’t answer my question as to what he thinks the state of the art is). So for now I’m pointing out that in at least some plausible scenarios the loss is at least 10%, and mostly just trying to understand why Paul thinks 10% expected loss is way too high rather than make a strong argument of my own.
I understood Paul to be arguing against 10% being the expected loss, in which case potentially making >10% changes in allocation doesn’t seem like a strong counterargument.
Does it help if I restated that as, I think that with high probability if I learned what my “true values” actually are, I’d make at least a 10% change in my resource allocations?
Does it help if I restated that as, I think that with high probability if I learned what my “true values” actually are, I’d make at least a 10% change in my resource allocations?
Yes, that’s clear.
So for now I’m pointing out that in at least some plausible scenarios the loss is at least 10%, and mostly just trying to understand why Paul thinks 10% expected loss is way too high rather than make a strong argument of my own.
I had an argument in mind that I thought Paul might be assuming, but on reflection I’m not sure it makes any sense (and so I update away from it being what Paul had in mind). But I’ll share it anyway in a child comment.
Suppose 1) you’re just choosing between spending and saving, 2) by default you’re going to allocate 50-50 to each, and 3) you know that there are X considerations, such that after you consider each one, you’ll adjust the ratio by 2:1 in one direction or the other.
If X is 1, then you expect to adjust the ratio by a factor of 2. If X is 10, you expect to adjust by a factor of sqrt(10)*2.
So, the more considerations there are that might affect the ratio, the more likely it is that you’ll end up with allocations close to 0% or 100%. And so, depending on how the realized value is related to the allocation ratio, skipping one of the considerations might not change the EV that much.
Tangentially, this seems relevant to a long-running disagreement between us, about how bad is it if AI can’t help us solve moral/philosophical problems, but only acquire resources and keep us in control.
It’s not from the rationality community or academia as much, and I haven’t looked into the transhumanist/singulatarian literature as much, but it’s my impression everyone presumes a successfully human-aligned superintelligence would be able to find solutions which peacefully satisfy as many parties as possible. One stereotypical example given is of how superintelligence may be able to achieve virtual post-scarcity not just for humanity now but for the whole galactic future. So the expectation is a superintelligent AI (SAI) would be a principal actor in determining humanity’s future. My impression from the AI alignment community coming from the rationalist side is SAI will be able to inevitably control the light cone no matter its goals, so the best we can do is align it with human interests. So while an SAI might acquire resources, it’s not clear an aligned SAI would keep humans in control, for different values of ‘aligned’ and ‘in control’.
So while an SAI might acquire resources, it’s not clear an aligned SAI would keep humans in control, for different values of ‘aligned’ and ‘in control’.
I was referring to Paul’s own approach to AI alignment, which does aim to keep humans in control. See this post where he mentions this, and perhaps this recent overview of Paul’s approach if you’re not familiar with it.
Von Neumann apparently thought so:
I guess the U.S. didn’t launch a first strike because it would have been politically unacceptable to kill millions of people in a situation that couldn’t be viewed as self defense. Tangentially, this seems relevant to a long-running disagreement between us, about how bad is it if AI can’t help us solve moral/philosophical problems, but only acquire resources and keep us in control. What counts as a decisive strategic advantage depends on one’s values and philosophical outlook in general, and this an instance of moral/philosophical confusion potentially being very costly, if the right thing to do from the perspective of the “real values” (e.g., CEV) of the Americans was to do (or threaten) a first strike, in order to either take over more of the universe for themselves or to prevent greater existential risk in the long run.
I agree that “failure to sort out philosophy/values early” has some costs, and this is a reasonable example. The question is: what fraction of the value of the future is sacrificed each subjective year?
Off the top of my head my guess is something like 1-2% per doubling. It sounds like your number is much larger.
(It’s a little bit hard to say exactly what counts. I’m talking about something like “value destroyed due to deficiencies in state of the art understanding” rather than “value destroyed due to all philosophical errors by everyone,” and so am not counting e.g. the costs from a “better dead then red” mentality.)
I do agree that measured in wall-clock time this is going to become a problem fast. So if AI accelerates problems more than philosophy/values, then we pay an additional cost that depends on the difference between (cumulative additional philosophy/values challenges introduced from AI) - (cumulative additional philosophy/values progress due to AI). I’d eyeball that number at ~1 doubling by default, so see this cost as a further 1-2% of the value of the future.
All of this stands against a 10-20% loss due to AI risk proper, and a 0.1-1% risk of extinction from non-AI technologies in the marginal pre-AGI years. So that’s where I’m coming from when I’m not super concerned about this problem.
(These numbers are very made up, my intention is to give a very rough sense of my intuitive model and quantitative intuitions. I could easily imagine that merely thinking about the overall structure of the problem would change my view, not to mention actually getting into details or empirical data.)
How big of a loss do you think the US sustained by not following von Neumann’s suggestion to pursue a decisive strategic advantage? (Or if von Neumann’s advice was actually wrong according to the American’s “real values”, how bad would it have been to follow it?)
What do you think is the state of the art understanding in how one should divide resources between saving/investing, personal/family consumption, and altruistic causes? How big of a loss from what’s “actually right” do you think that represents? (Would it be wrong for someone with substantial moral uncertainty to estimate that loss to be >10%?)
But one of my concerns is that AI will exacerbate the problem that the vast majority of people do not have a state of the art understanding of philosophy, for example by causing a lot of damage based on their incorrect understandings, or freezing or extrapolating from a base of incorrect understandings, or otherwise preempting a future where most people eventually fix their philosophical errors. At the same time AI is an opportunity to help solve this problem, if AI designers are foresighted enough (and can coordinate with each other to avoid races, etc.). So I don’t understand why you deliberately exclude this.
Is it because you think AI is not a good opportunity to solve this problem? Can you give your sense of how big this problem is anyway?
I’m not sure that’s the relevant number to look at. Suppose AI doesn’t accelerate problems more than philosophy/values, we’d still want AIs that can accelerate philosophy/values even more, to reduce the “normal” losses associated with doublings, which would be substantial even at 1% per doubling if added up over tens of doublings.
I agree that people failing to act on the basis of state-of-the-art understanding is a potentially large problem, and that it would be good to use AI as an opportunity to address that problem, I didn’t include it just because it seems like a separate thing (in this context I don’t see why philosophy should be distinguished from people acting on bad empirical views). I don’t have a strong view on this.
I agree that AI could fix the philosophy gap. But in terms of urgency of the problem, if AI accelerates everything equally, it still seems right to look at the cost per subjective unit of time.
10% doesn’t sound like a plausible estimate for the value destroyed from philosophy giving worse answers on saving/consumption/altruism. There are lots of inputs other than philosophical understanding into that question (empirical facts, internal bargaining, external bargaining, etc.), and that problem is itself one of a very large number of determinants of how well the future goes. If you are losing 10% EV on stuff like this, it seems like you are going to lose all the value with pretty high probability, so the less brittle parts of your mixture should dominate.
I see. I view them as related because both can potentially have the same solution, namely a solution to meta-philosophy that lets AIs make philosophical progress and convinces people to trust the philosophy done by AIs. I suppose you could try to solve the people-acting-on-bad-philosophical-views problem separately, by convincing them to adopt better views, but it seems hard to change people’s minds this way on a large scale.
I can easily imagine making >10% changes in my allocation based on changes in my philosophical understanding alone, so I don’t see why it matters that there are also other inputs.
On one plausible view, a crucial determinant of how well the future goes is how much of the universe is controlled by AIs/people who end up turning their piece of the universe over to the highest-value uses, which in turn is largely determined by how much they save/invest compared to AIs/people who end up with wrong values. That seems enough to show that 10% loss due to this input is plausible, regardless of other determinants.
What does this mean and how is it relevant?
Is your argument that 10% is the expected loss, or that it’s plausible that you’d lose 10%?
I understood Paul to be arguing against 10% being the expected loss, in which case potentially making >10% changes in allocation doesn’t seem like a strong counterargument.
I think >10% expected loss can probably be argued for, but giving a strong argument would involve going into the details of my state of moral/philosophical/empirical uncertainties and my resource allocations, and then considering various ways my uncertainties could be resolved (various possible combinations of philosophical and empirical outcomes), my expected loss in each scenario, and then averaging the losses. This is a lot of work, I’m a bit reluctant for privacy/signaling reasons, plus I don’t know if Paul would consider my understandings in this area to be state of the art (he didn’t answer my question as to what he thinks the state of the art is). So for now I’m pointing out that in at least some plausible scenarios the loss is at least 10%, and mostly just trying to understand why Paul thinks 10% expected loss is way too high rather than make a strong argument of my own.
Does it help if I restated that as, I think that with high probability if I learned what my “true values” actually are, I’d make at least a 10% change in my resource allocations?
Yes, that’s clear.
I had an argument in mind that I thought Paul might be assuming, but on reflection I’m not sure it makes any sense (and so I update away from it being what Paul had in mind). But I’ll share it anyway in a child comment.
Potentially confused argument:
Suppose 1) you’re just choosing between spending and saving, 2) by default you’re going to allocate 50-50 to each, and 3) you know that there are X considerations, such that after you consider each one, you’ll adjust the ratio by 2:1 in one direction or the other.
If X is 1, then you expect to adjust the ratio by a factor of 2. If X is 10, you expect to adjust by a factor of sqrt(10)*2.
So, the more considerations there are that might affect the ratio, the more likely it is that you’ll end up with allocations close to 0% or 100%. And so, depending on how the realized value is related to the allocation ratio, skipping one of the considerations might not change the EV that much.
It’s not from the rationality community or academia as much, and I haven’t looked into the transhumanist/singulatarian literature as much, but it’s my impression everyone presumes a successfully human-aligned superintelligence would be able to find solutions which peacefully satisfy as many parties as possible. One stereotypical example given is of how superintelligence may be able to achieve virtual post-scarcity not just for humanity now but for the whole galactic future. So the expectation is a superintelligent AI (SAI) would be a principal actor in determining humanity’s future. My impression from the AI alignment community coming from the rationalist side is SAI will be able to inevitably control the light cone no matter its goals, so the best we can do is align it with human interests. So while an SAI might acquire resources, it’s not clear an aligned SAI would keep humans in control, for different values of ‘aligned’ and ‘in control’.
I was referring to Paul’s own approach to AI alignment, which does aim to keep humans in control. See this post where he mentions this, and perhaps this recent overview of Paul’s approach if you’re not familiar with it.
Thanks for clarifying. I didn’t know you were specifically referring to Paul’s approach. I’ve got familiarize myself with it more.