because it is more task-specific and therefore technically simpler to achieve than general intelligence, doesn’t require escaping its own creators’ controls
An argument for danger of human-directed misuse doesn’t work as an argument against dangers of AI-directed agentic activity. Both are real, though misuse only becomes an extinction-level problem when AIs are very powerful, at which point the AI-directed activity that is not misuse by humans also becomes relevant. With extinction-level problems, it doesn’t matter for allocation of attention which one is worse (since after a critical failure there are no retries with a different allocation to reflect lessons learned), only that either is significant and so both need to be addressed.
If alignment is very easy, misuse becomes important. If it’s hard, absence of misuse doesn’t help. Though there is also a problem of cultural value drift, where AIs change their own culture very quickly on human timescales without anyone individually steering the outcome (including the AIs), so that at the end of this process (that might take merely months to years) the AIs in charge of civilization no longer care about human welfare, with neither misuse nor prosaic misalignment (in individual principal-agent relationships) being the cause of this outcome.
An argument for danger of human-directed misuse doesn’t work as an argument against dangers of AI-directed agentic activity.
I agree. But I was not trying to argue against dangers of AI-directed agentic activity. The thesis is not that “alignment risk” is overblown, nor is the comparison of the risks the point, it’s that those risks accumulate such that the technology is guaranteed to be lethal for the average person. This is significant because the risk of misalignment is typically thought to be accepted because of rewards that will be broadly shared. “You or your children are likely to be killed by this technology, whether it works as designed or not” is a very different story from “there is a chance this will go badly for everyone, but if it doesn’t it will be really great for everyone.”
An argument for danger of human-directed misuse doesn’t work as an argument against dangers of AI-directed agentic activity. Both are real, though misuse only becomes an extinction-level problem when AIs are very powerful, at which point the AI-directed activity that is not misuse by humans also becomes relevant. With extinction-level problems, it doesn’t matter for allocation of attention which one is worse (since after a critical failure there are no retries with a different allocation to reflect lessons learned), only that either is significant and so both need to be addressed.
If alignment is very easy, misuse becomes important. If it’s hard, absence of misuse doesn’t help. Though there is also a problem of cultural value drift, where AIs change their own culture very quickly on human timescales without anyone individually steering the outcome (including the AIs), so that at the end of this process (that might take merely months to years) the AIs in charge of civilization no longer care about human welfare, with neither misuse nor prosaic misalignment (in individual principal-agent relationships) being the cause of this outcome.
I agree. But I was not trying to argue against dangers of AI-directed agentic activity. The thesis is not that “alignment risk” is overblown, nor is the comparison of the risks the point, it’s that those risks accumulate such that the technology is guaranteed to be lethal for the average person. This is significant because the risk of misalignment is typically thought to be accepted because of rewards that will be broadly shared. “You or your children are likely to be killed by this technology, whether it works as designed or not” is a very different story from “there is a chance this will go badly for everyone, but if it doesn’t it will be really great for everyone.”
That’s an excellent summary sentence. It seems like that would be a useful statement in advocating for AI slowdown/shutdown.