I think a big problem of FAI is that valuing humans and/or human values (however defined) may fall under superstition, even if it seems more attractive to us and less arbitrary than a red wire/thermite setup.
If an FAI must value people, and is programmed to not be able to think near a line of thought which would lead it to not valuing people, is it significantly crippled? Relative to what we want, there’s no obvious problem, but would it be so weakened that it would lose out to UFAIs?
What line of thought could lead an FAI not to value people, that it would have to avoid? What does it mean for a value system to be superstitious? (see also: Ghosts in the Machine, the metaethics sequence)
What line of thought could lead an FAI not to value people, that it would have to avoid? An agent’s goal system can’t be ‘incorrect’. (see also: Ghosts in the Machine, the metaethics sequence)
I think a big problem of FAI is that valuing humans and/or human values (however defined) may fall under superstition, even if it seems more attractive to us and less arbitrary than a red wire/thermite setup.
If an FAI must value people, and is programmed to not be able to think near a line of thought which would lead it to not valuing people, is it significantly crippled? Relative to what we want, there’s no obvious problem, but would it be so weakened that it would lose out to UFAIs?
What line of thought could lead an FAI not to value people, that it would have to avoid? What does it mean for a value system to be superstitious? (see also: Ghosts in the Machine, the metaethics sequence)
What line of thought could lead an FAI not to value people, that it would have to avoid? An agent’s goal system can’t be ‘incorrect’. (see also: Ghosts in the Machine, the metaethics sequence)