Minimize is not “reduce to zero”. If the weighting is correct, the optimal outcome might very well be just the solution to your problem and nothing else. Also, this gives you some room for experiments. Start with a function which only values non-interference, and then gradually restart the AI with functions which include ever larger weights for solution finding, until you arrive at the solution.
Start with a function which only values non-interference, and then gradually restart the AI with functions which include ever larger weights for solution finding, until you arrive at the solution.
Stop, and read The Hidden Complexity of Wishes again. To us, killing a person or lobotomizing them feels like a bigger change than (say) moving a pile of rock; but unless your AI already shares your values, you can’t guarantee it will see things the same way.
Your AI would achieve its goal in the first way it finds that matches all the explicit criteria, interpreted without your background assumptions on what make for a ‘reasonable’ interpretation. Unless you’re sure you’ve ruled out every possible “creative” solution that happens to horrify you, this is not a safe plan.
Minimize is not “reduce to zero”. If the weighting is correct, the optimal outcome might very well be just the solution to your problem and nothing else. Also, this gives you some room for experiments. Start with a function which only values non-interference, and then gradually restart the AI with functions which include ever larger weights for solution finding, until you arrive at the solution.
Or until everyone is dead.
If the solution to your problem is only reachable by killing everybody, yes.
Um, then this is not a “safe” AI in any reasonable sense.
I claim that it is, as it is averse to killing people as a side effect. If your solution does not require killing people it would not.
Stop, and read The Hidden Complexity of Wishes again. To us, killing a person or lobotomizing them feels like a bigger change than (say) moving a pile of rock; but unless your AI already shares your values, you can’t guarantee it will see things the same way.
Your AI would achieve its goal in the first way it finds that matches all the explicit criteria, interpreted without your background assumptions on what make for a ‘reasonable’ interpretation. Unless you’re sure you’ve ruled out every possible “creative” solution that happens to horrify you, this is not a safe plan.