Firstly, a bias towards choices which leave less up to chance.
Wouldn’t this imply a bias towards eliminating other agents? (Since that would make the world more predictable, and thereby leave less up to chance?)
And thirdly, a bias towards choices which afford more choices later on.
Wouldn’t this strongly imply biases towards both self-preservation and resource acquisition?
If the above two implications hold, then the conclusion
that the biases induced by instrumental rationality at best weakly support [...] that machine superintelligence is likely to lead to existential catastrophe
seems incorrect, no?
Could you briefly explain what is wrong with the reasoning above, or point me to the parts of the post that do so? (I only read the Abstract.)
Wouldn’t this imply a bias towards eliminating other agents? (Since that would make the world more predictable, and thereby leave less up to chance?)
A few things to note. Firstly, when I say that there’s a ‘bias’ towards a certain kind of choice, I just mean that the probability that a superintelligent agent with randomly sampled desires (Sia) would make that choice is greater than 1/N, where N is the number of choices available. So, just to emphasize the scale of the effect: even if you were right about that inference, you should still assign very low probability to Sia taking steps to eliminate other agents.
Secondly, when I say that a choice “leaves less up to chance”, I just mean that the sum total of history is more predictable, given that choice, than the sum total of history is predictable, given other choices. (I mention this just because you didn’t read the post, and I want to make sure we’re not talking past each other.)
Thirdly, I would caution against the inference: without humans, things are more predictable; therefore, undertaking to eliminate other agents leaves less up to chance. Even if things are predictable after humans are eliminated, and even if Sia can cook up a foolproof contingency plan for eliminating all humans, that doesn’t mean that that contingency plan leaves less up to chance. Insofar as the contingency plan is sensitive to the human response at various stages, and insofar as that human response is unpredictable (or less predictable than humans are when you don’t try to kill them all), this bias wouldn’t lend any additional probability to Sia choosing that contingency plan.
Fourthly, this bias interacts with the others. Futures without humanity might be futures which involve fewer choices—other deliberative agents tend to force more decisions. So contingency plans which involve human extinction may involve comparatively fewer choicepoints than contingency plans which keep humans around. Insofar as Sia is biased towards contingency plans with more choicepoints, that’s a reason to think she’s biased against eliminating other agents. I don’t have any sense of how these biases interact, or which one is going to be larger in real-world decisions.
Wouldn’t this strongly imply biases towards both self-preservation and resource acquisition?
In some decisions, it may. But I think here, too, we need to tread with caution. In many decisions, this bias makes it somewhat more likely that Sia will pursue self-destruction. To quote myself:
Sia is biased towards choices which allow for more choices—but this isn’t the same thing as being biased towards choices which guarantee more choices. Consider a resolute Sia who is equally likely to choose any contingency plan, and consider the following sequential decision. At stage 1, Sia can either take a ‘safe’ option which will certainly keep her alive or she can play Russian roulette, which has a 1-in-6 probability of killing her. If she takes the ‘safe’ option, the game ends. If she plays Russian roulette and survives, then she’ll once again be given a choice to either take a ‘safe’ option of definitely staying alive or else play Russian roulette. And so on. Whenever she survives a game of Russian roulette, she’s again given the same choice. All else equal, if her desires are sampled normally, a resolute Sia will be much more likely to play Russian roulette at stage 1 than she will be to take the ‘safe’ option.
See the post to understand what I mean by “resolute”—and note that the qualitative effect doesn’t depend upon whether Sia is a resolute chooser.
To the extent that I understand your models here, I suspect they don’t meaningfully bind/correspond to reality. (Of course, I don’t understand your models at all well, and I don’t have the energy to process the whole post, so this doesn’t really provide you with much evidence; sorry.)
I wonder how one could test whether or not the models bind to reality? E.g. maybe there are case examples (of agents/people behaving in instrumentally rational ways) one could look at, and see if the models postdict the actual outcomes in those examples?
There’s nothing unusual about my assumptions regarding instrumental rationality. It’s just standard expected utility theory.
The place I see to object is with my way of spreading probabilities over Sia’s desires. But if you object to that, I want to hear more about which probably distribution I should be using to understand the claim that Sia’s desires are likely to rationalise power-seeking, resource acquisition, and so on. I reached for the most natural way of distributing probabilities I could come up with—I was trying to be charitable to the thesis, & interpreting it in light of the orthogonality thesis. But if that’s not the right way to distribute probability over potential desires, if it’s not the right way of understanding the thesis, then I’d like to hear something about what the right way of understanding it is.
Other agents are not random though. Many agents act in predictable ways. I certainly don’t model the actions of people as random noise. In this sense I don’t think other agents are different from any other physical system that might be more-or-less chaotic, unpredictable or difficult to control.
I agree.
But AFAICT that doesn’t really change the conclusion that less agents would tend to make the world more predictable/controllable. As you say yourself:
I don’t think other agents are different from any other physical system that might be more-or-less chaotic, unpredictable or difficult to control.
And that was the weaker of the two apparent problems. What about the {implied self-preservation and resource acquisition} part?
Wouldn’t this imply a bias towards eliminating other agents? (Since that would make the world more predictable, and thereby leave less up to chance?)
Wouldn’t this strongly imply biases towards both self-preservation and resource acquisition?
If the above two implications hold, then the conclusion
seems incorrect, no?
Could you briefly explain what is wrong with the reasoning above, or point me to the parts of the post that do so? (I only read the Abstract.)
A few things to note. Firstly, when I say that there’s a ‘bias’ towards a certain kind of choice, I just mean that the probability that a superintelligent agent with randomly sampled desires (Sia) would make that choice is greater than 1/N, where N is the number of choices available. So, just to emphasize the scale of the effect: even if you were right about that inference, you should still assign very low probability to Sia taking steps to eliminate other agents.
Secondly, when I say that a choice “leaves less up to chance”, I just mean that the sum total of history is more predictable, given that choice, than the sum total of history is predictable, given other choices. (I mention this just because you didn’t read the post, and I want to make sure we’re not talking past each other.)
Thirdly, I would caution against the inference: without humans, things are more predictable; therefore, undertaking to eliminate other agents leaves less up to chance. Even if things are predictable after humans are eliminated, and even if Sia can cook up a foolproof contingency plan for eliminating all humans, that doesn’t mean that that contingency plan leaves less up to chance. Insofar as the contingency plan is sensitive to the human response at various stages, and insofar as that human response is unpredictable (or less predictable than humans are when you don’t try to kill them all), this bias wouldn’t lend any additional probability to Sia choosing that contingency plan.
Fourthly, this bias interacts with the others. Futures without humanity might be futures which involve fewer choices—other deliberative agents tend to force more decisions. So contingency plans which involve human extinction may involve comparatively fewer choicepoints than contingency plans which keep humans around. Insofar as Sia is biased towards contingency plans with more choicepoints, that’s a reason to think she’s biased against eliminating other agents. I don’t have any sense of how these biases interact, or which one is going to be larger in real-world decisions.
In some decisions, it may. But I think here, too, we need to tread with caution. In many decisions, this bias makes it somewhat more likely that Sia will pursue self-destruction. To quote myself:
See the post to understand what I mean by “resolute”—and note that the qualitative effect doesn’t depend upon whether Sia is a resolute chooser.
Thanks for the response.
To the extent that I understand your models here, I suspect they don’t meaningfully bind/correspond to reality. (Of course, I don’t understand your models at all well, and I don’t have the energy to process the whole post, so this doesn’t really provide you with much evidence; sorry.)
I wonder how one could test whether or not the models bind to reality? E.g. maybe there are case examples (of agents/people behaving in instrumentally rational ways) one could look at, and see if the models postdict the actual outcomes in those examples?
There’s nothing unusual about my assumptions regarding instrumental rationality. It’s just standard expected utility theory.
The place I see to object is with my way of spreading probabilities over Sia’s desires. But if you object to that, I want to hear more about which probably distribution I should be using to understand the claim that Sia’s desires are likely to rationalise power-seeking, resource acquisition, and so on. I reached for the most natural way of distributing probabilities I could come up with—I was trying to be charitable to the thesis, & interpreting it in light of the orthogonality thesis. But if that’s not the right way to distribute probability over potential desires, if it’s not the right way of understanding the thesis, then I’d like to hear something about what the right way of understanding it is.
Other agents are not random though. Many agents act in predictable ways. I certainly don’t model the actions of people as random noise. In this sense I don’t think other agents are different from any other physical system that might be more-or-less chaotic, unpredictable or difficult to control.
I agree. But AFAICT that doesn’t really change the conclusion that less agents would tend to make the world more predictable/controllable. As you say yourself:
And that was the weaker of the two apparent problems. What about the {implied self-preservation and resource acquisition} part?