The core idea behind “AI alignment” is that superintelligent AI will be an agent maximizing some utility function, either explicitly or implicitly. Since it’s superintelligent, it will be really good at maximizing its utility function. So we, as humans, need to be sure that this utility function is “aligned” with something that humanity finds acceptable.
The standard argument also requires a premise that the UF is immutable (the AI is incorrigible) so you only get one shot.
The standard argument also requires a premise that the UF is immutable (the AI is incorrigible) so you only get one shot.
there’s reason to believe there’s risk of utility functions freezing into a system ways that don’t include cosmopolitanism and values archipelago.
Reference, please?