Is there a specific claim about instrumental convergence that you think is false or want to see arguments against?
I think there’s a relatively strong case that instrumental convergence isn’t a necessary property for a system to be dangerous. For example, viruses don’t exhibit instrumental convergence, or any real agency at all, yet they still manage to be plenty deadly to humans.
Instrumental convergence (and other scary properties, like utility maximization, deception, reflectivity, etc.) seem less like cruxes or key assumptions in the case for AI x-risk, and more like possible implications that follow from extrapolating a relatively simpler world model about what the most capable and general systems probably look like.
These points seems like an argument in support of the case for AI as an x-risk, rather than one against it, so perhaps they’re not what you’re looking for.
Re: specific claims to falsify, I generally buy the argument.
If I had to pick out specific aspects which seem weaker, I think they would mostly be related to our confusion around agent foundations. It isn’t trivially obvious to me that the way we describe “intelligence” or “goals” within the instrumental convergence argument is a good match for the way current systems operate (though it seems close enough, and we shouldn’t expect to be wrong in a way that makes the situation better).
I would agree that instrumental convergence is probably not a necessary component of AI x-risk, so you’re correct that “crux” is a bit of a misnomer.
However, in my experience it is one of the primary arguments people rely on when explaining their concerns to others. The correlation between credence in instrumental convergence and AI x-risk concern seems very high. IMO it is also one of the most concerning legs of the overall argument.
If somebody made a compelling case that we should not expect instrumental convergence by default in the current ML paradigm, I think the overall argument for x-risk would have to look fairly different from the one that is usually put forward.
Is there a specific claim about instrumental convergence that you think is false or want to see arguments against?
I think there’s a relatively strong case that instrumental convergence isn’t a necessary property for a system to be dangerous. For example, viruses don’t exhibit instrumental convergence, or any real agency at all, yet they still manage to be plenty deadly to humans.
Instrumental convergence (and other scary properties, like utility maximization, deception, reflectivity, etc.) seem less like cruxes or key assumptions in the case for AI x-risk, and more like possible implications that follow from extrapolating a relatively simpler world model about what the most capable and general systems probably look like.
These points seems like an argument in support of the case for AI as an x-risk, rather than one against it, so perhaps they’re not what you’re looking for.
Re: specific claims to falsify, I generally buy the argument.
If I had to pick out specific aspects which seem weaker, I think they would mostly be related to our confusion around agent foundations. It isn’t trivially obvious to me that the way we describe “intelligence” or “goals” within the instrumental convergence argument is a good match for the way current systems operate (though it seems close enough, and we shouldn’t expect to be wrong in a way that makes the situation better).
I would agree that instrumental convergence is probably not a necessary component of AI x-risk, so you’re correct that “crux” is a bit of a misnomer.
However, in my experience it is one of the primary arguments people rely on when explaining their concerns to others. The correlation between credence in instrumental convergence and AI x-risk concern seems very high. IMO it is also one of the most concerning legs of the overall argument.
If somebody made a compelling case that we should not expect instrumental convergence by default in the current ML paradigm, I think the overall argument for x-risk would have to look fairly different from the one that is usually put forward.