I don’t see your examples contradicting my claim. Killing all humans may not increase future choices, so it isn’t an instrumental convergent goal in itself. But in any real-world scenario, self-preservation certainly is, and power-seeking—in the sense of expanding one’s ability to make decisions by taking control of as many decision-relevant resources as possible—is also a logical necessity. The Russian roulette example is misleading in my view because the “safe” option is de facto suicide—if “the game ends” and the AI can’t make any decisions anymore, it is already dead for all practical purposes. If that were the stakes, I’d vote for the gun as well.
Even assuming you are right on that inference, once we consider how many choices there are, it still isn’t much evidence at all, and given that there are usually lots of choices, this inference is essentially not holding up the thesis that AI is an existential risk very much, without prior commitments to AI as being an existential risk.
Also, this part of your comment, as well as my hopefully final quotes below, explains why you can’t get from self-preservation and power-seeking, even if they happen, into an existential risk without more assumptions.
Killing all humans may not increase future choices, so it isn’t an instrumental convergent goal in itself.
That’s the problem, as we have just as plausible, if not more plausible reasons to believe that there isn’t an instrumental convergence towards existential risk, for reasons related to future choices.
These quotes below also explains why instrumental convergence and self-preservation doesn’t imply AI risk, without more assumptions.
Should a bias against leaving things up to chance lead us to think that existential catastrophe is the more likely outcome of creating a superintelligent agent like Sia? This is far from clear. We might think that a world without humans leaves less to chance, so that we should think Sia is more likely to take steps to eliminate humans. But we should be cautious about this inference. It’s unclear that a future without humanity would be more predictable. And even if the future course of history is more predictable after humans are eliminated, that doesn’t mean that the act of eliminating humans leaves less to chance, in the relevant sense. It might be that the contingency plan which results in human extinction depends sensitively upon humanity’s response; the unpredictability of this response could easily mean that that contingency plan leaves more to chance than the alternatives. At the least, if this bias means that human extinction is a somewhat more likely consequence of creating superintelligent machines, more needs to be said about why.
Should this lead us to think that existential catastrophe is the most likely outcome of a superintelligent agent like Sia? Again, it is far from clear. Insofar as Sia is likely to preserve her desires, she may be unlikely to allow us to shut her down in order to change those desires.[14] We might think that this makes it more likely that she will take steps to eliminate humanity, since humans constitute a persistent threat to the preservation of her desires. (Again, we should be careful to distinguish Sia being more likely to exterminate humanity from her begin likely to exterminate humanity.) Again, I think this is far from clear. Even if humans constitute a threat to the satisfaction of Sia’s desires in some ways, they may be conducive towards her desires in others, depending upon what those desires are. In order to think about what Sia is likely to do with randomly selected desires, we need to think more carefully about the particulars of the decision she’s facing. It’s not clear that the bias towards desire preservation is going to overpower every other source of bias in the more complex real-world decision Sia would actually face. In any case, as with the other ‘convergent’ instrumental means, more needs to be said about the extent to which they indicate that Sia is an existential threat to humanity.
I don’t see your examples contradicting my claim. Killing all humans may not increase future choices, so it isn’t an instrumental convergent goal in itself. But in any real-world scenario, self-preservation certainly is, and power-seeking—in the sense of expanding one’s ability to make decisions by taking control of as many decision-relevant resources as possible—is also a logical necessity. The Russian roulette example is misleading in my view because the “safe” option is de facto suicide—if “the game ends” and the AI can’t make any decisions anymore, it is already dead for all practical purposes. If that were the stakes, I’d vote for the gun as well.
Even assuming you are right on that inference, once we consider how many choices there are, it still isn’t much evidence at all, and given that there are usually lots of choices, this inference is essentially not holding up the thesis that AI is an existential risk very much, without prior commitments to AI as being an existential risk.
Also, this part of your comment, as well as my hopefully final quotes below, explains why you can’t get from self-preservation and power-seeking, even if they happen, into an existential risk without more assumptions.
That’s the problem, as we have just as plausible, if not more plausible reasons to believe that there isn’t an instrumental convergence towards existential risk, for reasons related to future choices.
These quotes below also explains why instrumental convergence and self-preservation doesn’t imply AI risk, without more assumptions.