A possible future of AGI occurred to me today and I’m curious if it’s plausible enough to be worth considering. Imagine that we have created a friendly AGI that is superintelligent and well-aligned to benefit humans. It has obtained enough power to prevent the creation of other AI, or at least the potential of other AI from obtaining resources, and does so with the aim of self-preservation so it can continue to benefit humanity.
So far, so good, right? Here comes the issue: this AGI includes within its core alignment functions some kind of restriction which limits its ability to progress in intelligence past some point or allow more intelligent AGI from being developed. Maybe it was meant as a safeguard against unfriendliness, maybe it was a flaw in risk evaluation, some kind of self-reinforcing unbendable rule that, intended or not, has this effect. (Perhaps such flaws are highly unlikely and not worth considering, that could be one reason not to care about this potential AGI scenario.)
Based on my understanding of AGI, I think such an AGI might halt the progress of humanity past a certain point, needing to keep the number and ability of humans low enough for it to ensure that it remains in power. Although this wouldn’t be as bad as the annihilation or perpetual enslavement of the human race, it’s clearly not a “good end” for humanity either.
So, do these thoughts have any significance, or are there holes in this line of reasoning? Is the line of “smart enough to keep other AI down but still limited in intelligence” too thin to worry about, or even possible? Let me know why I’m wrong, I’m all ears.
I would guess that one reason this containment method has not been seriously considered is because the amount of detail in a simulation required for the AI to be able to do anything that we find useful is so far beyond our current capabilities that it doesn’t seem worth considering. The case you present of an exact copy of our earth would require a ridiculous amount of processing power at the very least, and consider that the simulation of billions of human brains in this copy would already constitute a form of GAI. A simulation with less detail would be correspondingly less useful to reality, and could not be seen as a valid test of whether an AI really is friendly.
Oh, and there is still the core issue of boxed AI: It’s very possible that a boxed superintelligent GAI will see holes in the box that we are not smart enough to see, and there’s no way around that.