Vaniver comments on Open Thread, Feb 8 - Feb 15, 2016

Vaniver 9 Feb 2016 12:40 UTC
1 point

However, wouldn’t it research further and correct itself (and before that, have care to not do something un-correctable)?

Check out the Cake or Death value loading problem, as Stuart Armstrong puts it.

There’s a rough similarity to the ‘resist blackmail’ problem, which is that you need to be able to tell the difference between someone delivering bad news and doing bad things. If the AI is mistaken about what is right, we want to be able to correct it without being interpreted as villains out to destroy potential utility.

(Also, “correctable” is not really a low-level separation in reality, since the passage of time means nothing is truly correctable.)