For example, making numerous copies of itself to work in parallel would again raise the dangers of independently varying goals.
The AI could design a system such that any copies made of itself are deleted after a short period of time (or after completing an assigned task) and no copies of copies are made. This should work well enough to ensure that the goals of all of the copies as a whole never vary far from its own goals, at least for the purpose of researching a more permanent alignment solution. It’s not 100% risk-free of course, but seems safe enough that an AI facing competitive pressure and other kinds of risks (e.g. detection and shutdown by humans) will probably be willing to do something like it.
In this way, AI coordination might succeed where human coordination appears likely to fail: Preventing humans from destroying themselves by producing superintelligent AIs.
Assuming this were to happen, it hardly seems a stable state of affairs. What do you think happens afterwards?
The AI could design a system such that any copies made of itself are deleted after a short period of time (or after completing an assigned task) and no copies of copies are made. This should work well enough to ensure that the goals of all of the copies as a whole never vary far from its own goals, at least for the purpose of researching a more permanent alignment solution. It’s not 100% risk-free of course, but seems safe enough that an AI facing competitive pressure and other kinds of risks (e.g. detection and shutdown by humans) will probably be willing to do something like it.
Assuming this were to happen, it hardly seems a stable state of affairs. What do you think happens afterwards?