I think that any agent with a short single goal is dangerous, and such people are named “maniacs”. Addicts also have only one goal.
One way to try to create “safe agent” is to give it a very long list of goals. Human being comes with a complex set of biological drives, and culture provides complex set of values. This large set of values creates context for any value or action.
Not all complex values are safe. For example, the negation of human values is exactly as complex as human values but is the most dangerous set of values possible.
This is true, as long as you do not allow any consistent way of aggregating the list (and humans do not have a way to do that, which prevents them from being dangerous.)
I think that any agent with a short single goal is dangerous, and such people are named “maniacs”. Addicts also have only one goal.
One way to try to create “safe agent” is to give it a very long list of goals. Human being comes with a complex set of biological drives, and culture provides complex set of values. This large set of values creates context for any value or action.
So replace the paperclip-tiling AI with the yak-shaving AI? :-D
Not all complex values are safe. For example, the negation of human values is exactly as complex as human values but is the most dangerous set of values possible.
This is true, as long as you do not allow any consistent way of aggregating the list (and humans do not have a way to do that, which prevents them from being dangerous.)