I have some additional thoughts after thinking more about your proposal.
What wary me is the jump from AI to AGI learning. The proposal will work on Narrow AI level, approximately as similar model worked on in case of AlphaGoZero. The proposal will also work if we have perfectly aligned AGI, something like human upload or perfectly aligned Seed AI. It is rather possible that Seed AGI can grow in it capabilities while preserving aligning.
However, the question is how you model will survive the jump from narrow non-agential AI capabilities, to agential AGI capabilities? - This could happen during the evolution of your system in some unpredicted moment, and may include modeling of outside world, all humanity and some converging basic drives, like self-preservation. So it will be classical treacherous turn or intelligent explosion or “becoming self aware moment”—and in that moment previous ways of alignment will be instantly obsolete, and will not provide any guarantee that the system will be aligned on its new level of capabilities.
I have some additional thoughts after thinking more about your proposal.
What wary me is the jump from AI to AGI learning. The proposal will work on Narrow AI level, approximately as similar model worked on in case of AlphaGoZero. The proposal will also work if we have perfectly aligned AGI, something like human upload or perfectly aligned Seed AI. It is rather possible that Seed AGI can grow in it capabilities while preserving aligning.
However, the question is how you model will survive the jump from narrow non-agential AI capabilities, to agential AGI capabilities? - This could happen during the evolution of your system in some unpredicted moment, and may include modeling of outside world, all humanity and some converging basic drives, like self-preservation. So it will be classical treacherous turn or intelligent explosion or “becoming self aware moment”—and in that moment previous ways of alignment will be instantly obsolete, and will not provide any guarantee that the system will be aligned on its new level of capabilities.