If alignment is difficult, it is likely inductively difficult (difficult regardless of your base intelligence), and ASI will be cautious of creating a misaligned successor or upgrading itself in a way that risks misalignment.
You may argue it’s easier for an AI to upgrade itself, but if the process is hardware bound or even requires radical algorithmic changes, the ASI will need to create an aligned successor as preferences and values may not transfer directly to new architectures or hardwares.
If alignment is easy we will likely solve it with superhuman narrow intelligences and aligned near peak human level AGIs.
I think the first case is an argument against FOOM, unless the alignment problem is solvable but only at higher than human level intelligences (human meaning the intellectual prowess of the entire civilization equipped with narrow superhuman AI). That would be a strange but possible world.
This is a well-known hypothetical. What goes with it is remaining possibility of de novo creation of additional AGIs that either have architecture particularly suited for self-aligned self-improvement (with whatever values make it tractable), or of AGIs that ignore the alignment issue and pursue the task of capability improvement heedless of resulting value drift. Already having an AGI in the world doesn’t automatically rule out creation of more AGIs with different values and architectures, it only makes it easier.
Humans will definitely do this, using all AI/AGI assistance they can wield. Insufficiently smart or sufficiently weird agentic AGIs will do this. A world that doesn’t have security in depth to guard against this happening will do this. What it takes to get a safe world is either getting rid of the capability, not having AGIs and GPUs freely available; or sufficiently powerful oversight over all things that can be done.
Superintelligence that’s not specifically aimed to avoid setting up such security will probably convergently set it up. But it would also need to already be more than concerningly powerful to succeed, even if it has the world’s permission and endorsement. If it does succeed, there is some possibility of not getting into a further FOOM than that, for a little bit, while it’s converting the Moon into computing substrate.
If alignment is difficult, it is likely inductively difficult (difficult regardless of your base intelligence), and ASI will be cautious of creating a misaligned successor or upgrading itself in a way that risks misalignment.
You may argue it’s easier for an AI to upgrade itself, but if the process is hardware bound or even requires radical algorithmic changes, the ASI will need to create an aligned successor as preferences and values may not transfer directly to new architectures or hardwares.
If alignment is easy we will likely solve it with superhuman narrow intelligences and aligned near peak human level AGIs.
I think the first case is an argument against FOOM, unless the alignment problem is solvable but only at higher than human level intelligences (human meaning the intellectual prowess of the entire civilization equipped with narrow superhuman AI). That would be a strange but possible world.
This is a well-known hypothetical. What goes with it is remaining possibility of de novo creation of additional AGIs that either have architecture particularly suited for self-aligned self-improvement (with whatever values make it tractable), or of AGIs that ignore the alignment issue and pursue the task of capability improvement heedless of resulting value drift. Already having an AGI in the world doesn’t automatically rule out creation of more AGIs with different values and architectures, it only makes it easier.
Humans will definitely do this, using all AI/AGI assistance they can wield. Insufficiently smart or sufficiently weird agentic AGIs will do this. A world that doesn’t have security in depth to guard against this happening will do this. What it takes to get a safe world is either getting rid of the capability, not having AGIs and GPUs freely available; or sufficiently powerful oversight over all things that can be done.
Superintelligence that’s not specifically aimed to avoid setting up such security will probably convergently set it up. But it would also need to already be more than concerningly powerful to succeed, even if it has the world’s permission and endorsement. If it does succeed, there is some possibility of not getting into a further FOOM than that, for a little bit, while it’s converting the Moon into computing substrate.