It’s unclear if alignment is hard in the grand scheme of things. Could snowball quickly, with alignment for increasingly capable systems getting solved shortly after each level of capability is attained.
But at near-human level, which seems plausible for early LLM AGIs, this might be very important in requiring them to figure out coordination to control existential risk while alignment remains unsolved, remaining at relatively low capability level in the meantime. And solving alignment might be easier for AIs with simple goals, allowing them to recursively self-improve quickly. As a result, AIs aligned with humanity would remain vulnerable to FOOMing of misaligned AIs with simple goals, and would be forced by this circumstance to comprehensively prevent any possibility of their construction rather than mitigate the consequences.
I agree that’s a possible way things could be. However, I don’t see how it’s compatible with accepting the arguments that say we should assume that alignment is a hard problem. I mean absent such arguments why expect you have to do anything special beyond normal training to solve alignment?
As I see the argumentative landscape the high x-risk estimates depend on arguments that claim to give reason to believe that alignment is just a generally hard problem. I don’t see anything in those arguments that distinguishes between these two cases.
In other words our arguments for alignment difficulty don’t depend on any specific assumptions about capability of intelligence so we should currently assign the same probability to an AI being unable to save it’s alignment problem as we do to us being unable to solve it.
AIs have advantages such as thinking faster and being as good at everything as any other AI of the same kind. These advantages are what simultaneously makes them dangerous and puts them in a better position to figure out alignment or coordination that protects from misaligned AIs and human misuse of AIs. (Incidentally, see this comment on various relevant senses of “alignment”.) Being better at solving problems and having effectively more time to solve problems improves probability of solving a given problem.
It’s unclear if alignment is hard in the grand scheme of things. Could snowball quickly, with alignment for increasingly capable systems getting solved shortly after each level of capability is attained.
But at near-human level, which seems plausible for early LLM AGIs, this might be very important in requiring them to figure out coordination to control existential risk while alignment remains unsolved, remaining at relatively low capability level in the meantime. And solving alignment might be easier for AIs with simple goals, allowing them to recursively self-improve quickly. As a result, AIs aligned with humanity would remain vulnerable to FOOMing of misaligned AIs with simple goals, and would be forced by this circumstance to comprehensively prevent any possibility of their construction rather than mitigate the consequences.
I agree that’s a possible way things could be. However, I don’t see how it’s compatible with accepting the arguments that say we should assume that alignment is a hard problem. I mean absent such arguments why expect you have to do anything special beyond normal training to solve alignment?
As I see the argumentative landscape the high x-risk estimates depend on arguments that claim to give reason to believe that alignment is just a generally hard problem. I don’t see anything in those arguments that distinguishes between these two cases.
In other words our arguments for alignment difficulty don’t depend on any specific assumptions about capability of intelligence so we should currently assign the same probability to an AI being unable to save it’s alignment problem as we do to us being unable to solve it.
AIs have advantages such as thinking faster and being as good at everything as any other AI of the same kind. These advantages are what simultaneously makes them dangerous and puts them in a better position to figure out alignment or coordination that protects from misaligned AIs and human misuse of AIs. (Incidentally, see this comment on various relevant senses of “alignment”.) Being better at solving problems and having effectively more time to solve problems improves probability of solving a given problem.