I agree that’s a possible way things could be. However, I don’t see how it’s compatible with accepting the arguments that say we should assume that alignment is a hard problem. I mean absent such arguments why expect you have to do anything special beyond normal training to solve alignment?
As I see the argumentative landscape the high x-risk estimates depend on arguments that claim to give reason to believe that alignment is just a generally hard problem. I don’t see anything in those arguments that distinguishes between these two cases.
In other words our arguments for alignment difficulty don’t depend on any specific assumptions about capability of intelligence so we should currently assign the same probability to an AI being unable to save it’s alignment problem as we do to us being unable to solve it.
AIs have advantages such as thinking faster and being as good at everything as any other AI of the same kind. These advantages are what simultaneously makes them dangerous and puts them in a better position to figure out alignment or coordination that protects from misaligned AIs and human misuse of AIs. (Incidentally, see this comment on various relevant senses of “alignment”.) Being better at solving problems and having effectively more time to solve problems improves probability of solving a given problem.
I agree that’s a possible way things could be. However, I don’t see how it’s compatible with accepting the arguments that say we should assume that alignment is a hard problem. I mean absent such arguments why expect you have to do anything special beyond normal training to solve alignment?
As I see the argumentative landscape the high x-risk estimates depend on arguments that claim to give reason to believe that alignment is just a generally hard problem. I don’t see anything in those arguments that distinguishes between these two cases.
In other words our arguments for alignment difficulty don’t depend on any specific assumptions about capability of intelligence so we should currently assign the same probability to an AI being unable to save it’s alignment problem as we do to us being unable to solve it.
AIs have advantages such as thinking faster and being as good at everything as any other AI of the same kind. These advantages are what simultaneously makes them dangerous and puts them in a better position to figure out alignment or coordination that protects from misaligned AIs and human misuse of AIs. (Incidentally, see this comment on various relevant senses of “alignment”.) Being better at solving problems and having effectively more time to solve problems improves probability of solving a given problem.