I’m not necessarily going to argue with your characterization of how the “AI safety” field views the world. I’ve noticed myself that people say “maintaining human control” pretty much interchangeably with “alignment”, and use both of those pretty much interchangeably with “safety”. And all of the above have their own definition problems.
I think that’s one of several reasons that the “AI safety” field has approximately zero chance of avoiding any of the truly catastrophic possible outcomes.
I agree that the conflation between maintaining human control and alignment and safety is a problem, and to be clear I’m not saying that the outcome of human-controlled AI taking over because someone ordered to do that is an objectively safe outcome.
I agree at present the AI safety field is poorly equipped to avoid catastrophic outcomes that don’t involve extinction from uncontrolled AIs.
I’m not necessarily going to argue with your characterization of how the “AI safety” field views the world. I’ve noticed myself that people say “maintaining human control” pretty much interchangeably with “alignment”, and use both of those pretty much interchangeably with “safety”. And all of the above have their own definition problems.
I think that’s one of several reasons that the “AI safety” field has approximately zero chance of avoiding any of the truly catastrophic possible outcomes.
I agree that the conflation between maintaining human control and alignment and safety is a problem, and to be clear I’m not saying that the outcome of human-controlled AI taking over because someone ordered to do that is an objectively safe outcome.
I agree at present the AI safety field is poorly equipped to avoid catastrophic outcomes that don’t involve extinction from uncontrolled AIs.