Furthermore, we should take seriously the possibility that superintelligent AGIs might be even less focused than humans are on achieving large-scale goals. We can imagine them possessing final goals which don’t incentivise the pursuit of power, such as deontological goals, or small-scale goals.
...
My underlying argument is that agency is not just an emergent property of highly intelligent systems, but rather a set of capabilities which need to be developed during training, and which won’t arise without selection for it
It seems like you’re identifying the same additional step that Ben identified, and that I argued could be satisfied—that we need a plausible reason why we would build an agentive AI with large-scale goals.
And the same applies for ‘instrumental convergence’ - the observation that most possible goals, especially simple goals, imply a tendency to produce extreme outcomes when ruthlessly maximised:
A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable.
We could see this as marking out a potential danger—a large number of possible mind-designs produce very bad outcomes if implemented. The fact that such designs exist ‘weakly suggest’ (Ben’s words) that AGI poses an existential risk since we might build them. If we add in other premises that imply we are likely to (accidentally or deliberately) build such systems, the argument becomes stronger. But usually the classic arguments simply note instrumental convergence and assume we’re ‘shooting into the dark’ in the space of all possible minds, because they take the abstract statement about possible minds to be speaking directly about the physical world. There are specific reasons to think this might occur (e.g. mesa-optimisation, sufficiently fast progress preventing us from course-correcting if there is even a small initial divergence) but those are the reasons that combine with instrumental convergence to produce a concrete risk, and have to be argued for separately.
I do like the link you’ve drawn between this argument and Ben’s one. I don’t think I was inspired by it, though; rather, I was mostly looking for a better definition of agency, so that we could describe what it might look like to have highly agentic agents without large-scale goals.
Was this line of argument inspired by Ben Garfinkel’s objection to the ‘classic’ formulation of instrumental convergence/orthogonality—that these are ‘measure based’ arguments that just identify that a majority of possible agents with some agentive properties and large-scale goals will optimize in malign ways, rather than establishing that we’re actually likely to build such agents?
It seems like you’re identifying the same additional step that Ben identified, and that I argued could be satisfied—that we need a plausible reason why we would build an agentive AI with large-scale goals.
I do like the link you’ve drawn between this argument and Ben’s one. I don’t think I was inspired by it, though; rather, I was mostly looking for a better definition of agency, so that we could describe what it might look like to have highly agentic agents without large-scale goals.