Eh, the way I phrased that statement, I’d actually meant that an AGI aligned to human values would also be a subject of AGI-doom arguments, in the sense that it’d exhibit instrumental convergence, power-seeking, et cetera. It wouldn’t do that in the domains where that’d be at odds with its values – for example, in cases where that’d be violating human agency —but that’s true of all other AGIs as well. (A paperclip-maximizer wouldn’t erase its memory of what “a paperclip” is to free up space for combat plans.)
In particular, that statement certainly weren’t intended as a claim that an aligned AGI is impossible. Just that its internal structure would likely be that of an embedded agent, and that if the free parameter of its values were changed, it’d be an extinction threat.
Eh, the way I phrased that statement, I’d actually meant that an AGI aligned to human values would also be a subject of AGI-doom arguments, in the sense that it’d exhibit instrumental convergence, power-seeking, et cetera. It wouldn’t do that in the domains where that’d be at odds with its values – for example, in cases where that’d be violating human agency —but that’s true of all other AGIs as well. (A paperclip-maximizer wouldn’t erase its memory of what “a paperclip” is to free up space for combat plans.)
In particular, that statement certainly weren’t intended as a claim that an aligned AGI is impossible. Just that its internal structure would likely be that of an embedded agent, and that if the free parameter of its values were changed, it’d be an extinction threat.