Alignment researchers have given up on aligning an AI with human values, it’s too hard! Human values are ill-defined, changing, and complicated things which they have no good proxy for. Humans don’t even agree on all their values!
Instead, the researchers decide to align their AI with the simpler goal of “creating as many paperclips as possible”. If the world is going to end, why not have it end in a funny way?
Sadly it wasn’t so easy, the first prototype of Clippy grew addicted to watching YouTube videos of paperclip unboxing, and the second prototype hacked its camera feed replacing it with an infinite scrolling of paperclips. Clippy doesn’t seem to care about paper clips in the real world.
How can the researchers make Clippy care about the real world? (and preferably real-world paperclips too)
This is basically the diamond-maximizer problem. in my opinion, the “preciseness” we can specify diamonds at is a red herring. At the quantum level or below what counts as a diamond could start to get fuzzy
Alignment researchers have given up on aligning an AI with human values, it’s too hard! Human values are ill-defined, changing, and complicated things which they have no good proxy for. Humans don’t even agree on all their values!
Instead, the researchers decide to align their AI with the simpler goal of “creating as many paperclips as possible”. If the world is going to end, why not have it end in a funny way?
Sadly it wasn’t so easy, the first prototype of Clippy grew addicted to watching YouTube videos of paperclip unboxing, and the second prototype hacked its camera feed replacing it with an infinite scrolling of paperclips. Clippy doesn’t seem to care about paper clips in the real world.
How can the researchers make Clippy care about the real world? (and preferably real-world paperclips too)
This is basically the diamond-maximizer problem. in my opinion, the “preciseness” we can specify diamonds at is a red herring. At the quantum level or below what counts as a diamond could start to get fuzzy