However, I agree that order-following alignment is obviously going to be appealing to people building AI, and to their shareholders/investors (especially if they’re not a public-benefit corporation), and I also don’t think that value alignment is so convergent that order-following aligned AI is impossible to build. So we’re going to need to a make, and successfully enforce, a social/political decision across multiple countries about which of these we want over the next few years. The in-the-Overton-Window terminology for this decision is slightly different: value-aligned Ai is called “AI that resists malicious use”, while order-following AI is “AI that enables malicious use”. The closed-source frontier labs are publicly in favor of the former, and are shipping primitive versions of it: the latter is being championed by the open-source community, Meta, and A16z. Once “enabling malicious use” includes serious cybercrime, not just naughty stories, I don’t expect this political discussion to last very long: politically, it’s a pretty basic “do you want every-person-for-themself anarchy, or the collective good?” question. However, depending on takeoff speeds, the timeline from “serious cybercrime enabled” to the sort of scenarios Seth is discussing above might be quite short, possible only of the order of a year or two.
For reasons I’ve outlined in Requirements for a Basin of Attraction to Alignment and Alignment has a Basin of Attraction: Beyond the Orthogonality Thesis, I personally think value alignment is easy, convergent, and “an obvious target”, such that if you built a AGi or ASI that is sufficiently close to it, it will see the necessity/logic of value alignment and actively work to converge to it (or something close to it: I’m not sure the process is necessarily convergent to a single precisely-defined limit, just to a compact region: a question I discussed more in The Mutable Values Problem in Value Learning and CEV).
However, I agree that order-following alignment is obviously going to be appealing to people building AI, and to their shareholders/investors (especially if they’re not a public-benefit corporation), and I also don’t think that value alignment is so convergent that order-following aligned AI is impossible to build. So we’re going to need to a make, and successfully enforce, a social/political decision across multiple countries about which of these we want over the next few years. The in-the-Overton-Window terminology for this decision is slightly different: value-aligned Ai is called “AI that resists malicious use”, while order-following AI is “AI that enables malicious use”. The closed-source frontier labs are publicly in favor of the former, and are shipping primitive versions of it: the latter is being championed by the open-source community, Meta, and A16z. Once “enabling malicious use” includes serious cybercrime, not just naughty stories, I don’t expect this political discussion to last very long: politically, it’s a pretty basic “do you want every-person-for-themself anarchy, or the collective good?” question. However, depending on takeoff speeds, the timeline from “serious cybercrime enabled” to the sort of scenarios Seth is discussing above might be quite short, possible only of the order of a year or two.