I don’t know why you keep harping on this. Just because an algorithm logically can produce a certain output, and probably will produce that output, doesn’t mean good intentions and vigorous handwaving are any less capable of magic.
This is why when I fire a gun, I just point it in the general direction of my target, and assume the universe will know what I meant to hit.
As a failure mode, “vague, useless, or trivially-obvious suggestions” is less of a problem than “rapidly eradicates all life.” Historically, projects that were explicitly designed to be safe even when they inevitably failed have been more successful and less deadly than projects which were obsessively designed never to fail at all.
Indeed, one of the first things we teach our engineers is “Even if you’re sure it can’t fail, plan for failure anyway. Many before you have been sure things couldn’t fail—that failed.”
Indeed it isn’t, although I’m not so foolish as to claim to know how to fully specify my suggestion in a way that avoids all of these sorts of problems.
Actually, the easiest output for the AI in that case is “be happy.”
But—that’s not what he meant!
I don’t know why you keep harping on this. Just because an algorithm logically can produce a certain output, and probably will produce that output, doesn’t mean good intentions and vigorous handwaving are any less capable of magic.
This is why when I fire a gun, I just point it in the general direction of my target, and assume the universe will know what I meant to hit.
I mean, it works in so many video games.
As a failure mode, “vague, useless, or trivially-obvious suggestions” is less of a problem than “rapidly eradicates all life.” Historically, projects that were explicitly designed to be safe even when they inevitably failed have been more successful and less deadly than projects which were obsessively designed never to fail at all.
Indeed, one of the first things we teach our engineers is “Even if you’re sure it can’t fail, plan for failure anyway. Many before you have been sure things couldn’t fail—that failed.”
Indeed it isn’t, although I’m not so foolish as to claim to know how to fully specify my suggestion in a way that avoids all of these sorts of problems.