The main issue I have with this premise that doesn’t seem to be addressed very well, is why would literally every competent AI have the exact same goals that only align with AI and not with humans? If AI’s goals conflicted with human goals, wouldn’t they likely conflict with the goals of other AIs too? I find it hard to imagine AI being so uniform without some seriously large amount of centralisation.
The idea is that many different goals all have the useful subgoals of “acquire reasons to build stuff”, and humans/the-earth are made of atoms which are resources:
I don’t think that response makes sense. The classic instrumental convergence arguments are about a single agent; OP is asking why distinct AIs would coordinate with one another.
I think the AIs may well have goals that conflict with one another, just as humans’ goals do, but it’s plausible that they would form a coalition and work against humans’ interests because they expect a shared benefit, as humans sometimes do.
I agree with this, but also note that this topic is outside the scope of the post—it’s just about what would happen if AIs were aimed at defeating humanity, for whatever reason. It’s a separate question whether we should expect misaligned AIs to share enough goals, or have enough to gain from coordinating, to “team up.” I’ll say that if my main argument against catastrophe risk hinged on this (e.g., “We’re creating a bunch of AIs that would be able to defeat humanity if they coordinated, and would each individually like to defeat humanity, but won’t coordinate because of having different goals from eacha other”) I’d feel extremely nervous.
Not only that, but if your goals is to create a powerful army of AIs the last thing you’d want to do is make them all identical. Any reason you’re going to choose for why there are a huge number of AI instances in the first place—as assumed by this argument—would want those AIs to be diverse, not identical, and that very diversity would argue against “emergent convergence”. You then have to revert to the “independently emerging common sub-goals” argument, which is a significantly bigger stretch because of the many additional assumptions it makes.
The main issue I have with this premise that doesn’t seem to be addressed very well, is why would literally every competent AI have the exact same goals that only align with AI and not with humans? If AI’s goals conflicted with human goals, wouldn’t they likely conflict with the goals of other AIs too? I find it hard to imagine AI being so uniform without some seriously large amount of centralisation.
The idea is that many different goals all have the useful subgoals of “acquire reasons to build stuff”, and humans/the-earth are made of atoms which are resources:
https://arbital.com/p/instrumental_convergence/
I don’t think that response makes sense. The classic instrumental convergence arguments are about a single agent; OP is asking why distinct AIs would coordinate with one another.
I think the AIs may well have goals that conflict with one another, just as humans’ goals do, but it’s plausible that they would form a coalition and work against humans’ interests because they expect a shared benefit, as humans sometimes do.
I agree with this, but also note that this topic is outside the scope of the post—it’s just about what would happen if AIs were aimed at defeating humanity, for whatever reason. It’s a separate question whether we should expect misaligned AIs to share enough goals, or have enough to gain from coordinating, to “team up.” I’ll say that if my main argument against catastrophe risk hinged on this (e.g., “We’re creating a bunch of AIs that would be able to defeat humanity if they coordinated, and would each individually like to defeat humanity, but won’t coordinate because of having different goals from eacha other”) I’d feel extremely nervous.
Not only that, but if your goals is to create a powerful army of AIs the last thing you’d want to do is make them all identical. Any reason you’re going to choose for why there are a huge number of AI instances in the first place—as assumed by this argument—would want those AIs to be diverse, not identical, and that very diversity would argue against “emergent convergence”. You then have to revert to the “independently emerging common sub-goals” argument, which is a significantly bigger stretch because of the many additional assumptions it makes.