I think you mix up goals of the AI (paperclipping for once), and the model of reality it develops. I assume that it serves any goal of an AI best to have a highly realistic model of the world, and that it would detect any kind of tampering with that.
Now I have no idea what happens if you hardcode a part of its view on nature, but I can imagine it will not be pleasant.
It crippling to limit thought in that way, and maybe you prevent it from discovering something important.
I think a big problem of FAI is that valuing humans and/or human values (however defined) may fall under superstition, even if it seems more attractive to us and less arbitrary than a red wire/thermite setup.
If an FAI must value people, and is programmed to not be able to think near a line of thought which would lead it to not valuing people, is it significantly crippled? Relative to what we want, there’s no obvious problem, but would it be so weakened that it would lose out to UFAIs?
What line of thought could lead an FAI not to value people, that it would have to avoid? What does it mean for a value system to be superstitious? (see also: Ghosts in the Machine, the metaethics sequence)
What line of thought could lead an FAI not to value people, that it would have to avoid? An agent’s goal system can’t be ‘incorrect’. (see also: Ghosts in the Machine, the metaethics sequence)
I think you mix up goals of the AI (paperclipping for once), and the model of reality it develops. I assume that it serves any goal of an AI best to have a highly realistic model of the world, and that it would detect any kind of tampering with that. Now I have no idea what happens if you hardcode a part of its view on nature, but I can imagine it will not be pleasant. It crippling to limit thought in that way, and maybe you prevent it from discovering something important.
I think a big problem of FAI is that valuing humans and/or human values (however defined) may fall under superstition, even if it seems more attractive to us and less arbitrary than a red wire/thermite setup.
If an FAI must value people, and is programmed to not be able to think near a line of thought which would lead it to not valuing people, is it significantly crippled? Relative to what we want, there’s no obvious problem, but would it be so weakened that it would lose out to UFAIs?
What line of thought could lead an FAI not to value people, that it would have to avoid? What does it mean for a value system to be superstitious? (see also: Ghosts in the Machine, the metaethics sequence)
What line of thought could lead an FAI not to value people, that it would have to avoid? An agent’s goal system can’t be ‘incorrect’. (see also: Ghosts in the Machine, the metaethics sequence)