Søren Elverlin comments on Counterarguments to the basic AI x-risk case

Søren Elverlin 14 Oct 2022 20:34 UTC
0 points
0

Talking concretely, what does a utility function look like that is so close to a human utility function that an AI system has it after a bunch of training, but which is an absolute disaster?

A simple example could be that the humans involved in the initial training are negative utilitarians. Once the AI is powerful enough, it would be able to implement omnicide rather than just curing diseases.