Also, AIs that understand clean mind-architectures seem deeper in the tech tree than AIs that can do some crazy stuff; why didn’t the world end five years before reaching this hypothetical?
Possible world: Alignment is too hard for a small group of people to cleanly understand, but not too far beyond that. In part because the profitability/researcher status gradient doesn’t push AI research towards alignment, building an AI which is cleanly designed and aligned is a natural solution found by a mid-level messy AI, even though that mid-level messy AI is still too dumb to help mainstream researchers gain a ton of power by the tasks they try it on. Because gaining power is hard due to adversarial pressures.
(After I’ve written that, I believe what I’ve written less, one because it involves a few independent details, but two because I don’t see why the mainstream researchers wouldn’t have elicited that capability but alignment researchers did.
I have an intuition that I didn’t fully express with the above, though, and so I’m not totally backing off of my hunch that there’s some gap in your argument which I quoted.)
Possible world: Alignment is too hard for a small group of people to cleanly understand, but not too far beyond that. In part because the profitability/researcher status gradient doesn’t push AI research towards alignment, building an AI which is cleanly designed and aligned is a natural solution found by a mid-level messy AI, even though that mid-level messy AI is still too dumb to help mainstream researchers gain a ton of power by the tasks they try it on. Because gaining power is hard due to adversarial pressures.
(After I’ve written that, I believe what I’ve written less, one because it involves a few independent details, but two because I don’t see why the mainstream researchers wouldn’t have elicited that capability but alignment researchers did.
I have an intuition that I didn’t fully express with the above, though, and so I’m not totally backing off of my hunch that there’s some gap in your argument which I quoted.)