I was listening to Anders Sandberg talk about “humble futures” (i.e., futures that may be considered good in a sober non-”let’s tile the universe with X” way), and started wondering whether training (not yet proven safe) AIs to have such “humble” scope-insensitive-ish goals—which seems more tractable than (complete) value alignment—might disincentivize the AI from incapacitating humans?
Why would it disincentivize it this way? I have some ideas but I thought I wouldn’t flesh them out here to make sure people don’t anchor on the particular scenarios I have in mind.
Here’s an AI-generated image of a scope-insensitive AI chilling with a cup of tea to help you think:
[Question] Would a scope-insensitive AGI be less likely to incapacitate humanity?
I was listening to Anders Sandberg talk about “humble futures” (i.e., futures that may be considered good in a sober non-”let’s tile the universe with X” way), and started wondering whether training (not yet proven safe) AIs to have such “humble” scope-insensitive-ish goals—which seems more tractable than (complete) value alignment—might disincentivize the AI from incapacitating humans?
Why would it disincentivize it this way? I have some ideas but I thought I wouldn’t flesh them out here to make sure people don’t anchor on the particular scenarios I have in mind.
Here’s an AI-generated image of a scope-insensitive AI chilling with a cup of tea to help you think: