I definitely agree that under the more common usage of safety that an AI doing what a human ordered in taking over the world or breaking laws for their owner would not be classified as safe, but in an AI safety context, alignment/safety does usually mean that these outcomes would be classified as safe.
My own view is that the technical problem is IMO shaping up to be a relatively easy problem, but I think that the political problems of advanced AI will probably prove a lot harder, especially in a future where humans control AIs for a long time.
I’m not necessarily going to argue with your characterization of how the “AI safety” field views the world. I’ve noticed myself that people say “maintaining human control” pretty much interchangeably with “alignment”, and use both of those pretty much interchangeably with “safety”. And all of the above have their own definition problems.
I think that’s one of several reasons that the “AI safety” field has approximately zero chance of avoiding any of the truly catastrophic possible outcomes.
I agree that the conflation between maintaining human control and alignment and safety is a problem, and to be clear I’m not saying that the outcome of human-controlled AI taking over because someone ordered to do that is an objectively safe outcome.
I agree at present the AI safety field is poorly equipped to avoid catastrophic outcomes that don’t involve extinction from uncontrolled AIs.
I definitely agree that under the more common usage of safety that an AI doing what a human ordered in taking over the world or breaking laws for their owner would not be classified as safe, but in an AI safety context, alignment/safety does usually mean that these outcomes would be classified as safe.
My own view is that the technical problem is IMO shaping up to be a relatively easy problem, but I think that the political problems of advanced AI will probably prove a lot harder, especially in a future where humans control AIs for a long time.
I’m not necessarily going to argue with your characterization of how the “AI safety” field views the world. I’ve noticed myself that people say “maintaining human control” pretty much interchangeably with “alignment”, and use both of those pretty much interchangeably with “safety”. And all of the above have their own definition problems.
I think that’s one of several reasons that the “AI safety” field has approximately zero chance of avoiding any of the truly catastrophic possible outcomes.
I agree that the conflation between maintaining human control and alignment and safety is a problem, and to be clear I’m not saying that the outcome of human-controlled AI taking over because someone ordered to do that is an objectively safe outcome.
I agree at present the AI safety field is poorly equipped to avoid catastrophic outcomes that don’t involve extinction from uncontrolled AIs.