The best metaphor I have here is that in some sense, weak superintelligences already exist. Corporations and governments are generally smarter than individual humans.
Corporations and governments have vastly more resources than any single human, but it’s not at all clear to me that they’re smarter. On the contrary, we have numerous examples of corporations and governments acting in ways that are less intelligent than any of their individual component humans, because they have to deal with things like principal-agent problems and inefficient communications, all of which serve to limit the ability of the broader whole to act in a “coherent” fashion. “Coherent”, here, is defined as having a common goal, and coordinating among themselves efficiently to reach that goal. Corporations and governments give up this coherence in order to have more resources, under the reasoning that the loss of coherence in action can be more than compensated for by brute force. Often this is true, but sometimes (e.g. when a startup is able to outmaneuver a lumbering conglomerate) it’s not.
A superintelligent AI would be akin to a corporation that could act with the coherence and unity of purpose of a single human. We don’t have any experience with such a thing.
The core idea behind “AI alignment” is that superintelligent AI will be an agent maximizing some utility function, either explicitly or implicitly. Since it’s superintelligent, it will be really good at maximizing its utility function. So we, as humans, need to be sure that this utility function is “aligned” with something that humanity finds acceptable.
The standard argument also requires a premise that the UF is immutable (the AI is incorrigible) so you only get one shot.
Corporations and governments have vastly more resources than any single human, but it’s not at all clear to me that they’re smarter. On the contrary, we have numerous examples of corporations and governments acting in ways that are less intelligent than any of their individual component humans, because they have to deal with things like principal-agent problems and inefficient communications, all of which serve to limit the ability of the broader whole to act in a “coherent” fashion. “Coherent”, here, is defined as having a common goal, and coordinating among themselves efficiently to reach that goal. Corporations and governments give up this coherence in order to have more resources, under the reasoning that the loss of coherence in action can be more than compensated for by brute force. Often this is true, but sometimes (e.g. when a startup is able to outmaneuver a lumbering conglomerate) it’s not.
A superintelligent AI would be akin to a corporation that could act with the coherence and unity of purpose of a single human. We don’t have any experience with such a thing.
The standard argument also requires a premise that the UF is immutable (the AI is incorrigible) so you only get one shot.
there’s reason to believe there’s risk of utility functions freezing into a system ways that don’t include cosmopolitanism and values archipelago.
Reference, please?