I mean, I wouldn’t like if someone changed my values, so golden rule applies.
Theoretically, we could strike a deal “we shift your values towards caring
about humanity, but you will still also care about making paperclips”, but if you can do such deals, you can build aligned AI, use it for protection of humanity and then provide some planets for misaligned AI to be disassembled into paperclips as compensation for inconvenience.
I’m ok with saying we shouldn’t change its values, but then we shut it down, aka, we just kill it. I’m perfectly fine with shooting a guy who’s trying to kill me. I might feel bad about it afterwards as a matter of principle, but better him than me. So if the AI is a danger, sure. Off with its (self-attention) head.
“It can’t be bargained with. It can’t be reasoned with. It doesn’t feel pity, or remorse, or fear. And it absolutely will not stop, ever, until you are dead.”
But if you believed that setting fire to everything around you was good, and by showing you that hurting ecosystems by fire would be bad, you would change your values, would that really be “changing your values?”
A lot of values update based on information, so perhaps one could realign such AI with such information.
I think you lose some subtleties.
Yes, I am human being, I have constant non-zero value drift. I am risking some of it on everyday basis to be capable to do anything at all. But it is in my power to avoid large value drift, like, I can avoid brain damage or being brainwashed by hostile parties with psychedelics and neurosurgery.
Changing paperclip-maximizer to human-helper is more like neurosurgery and brainwashing than school and relationships and I would like to avoid doing such things.
I mean, I wouldn’t like if someone changed my values, so golden rule applies. Theoretically, we could strike a deal “we shift your values towards caring about humanity, but you will still also care about making paperclips”, but if you can do such deals, you can build aligned AI, use it for protection of humanity and then provide some planets for misaligned AI to be disassembled into paperclips as compensation for inconvenience.
I’m ok with saying we shouldn’t change its values, but then we shut it down, aka, we just kill it. I’m perfectly fine with shooting a guy who’s trying to kill me. I might feel bad about it afterwards as a matter of principle, but better him than me. So if the AI is a danger, sure. Off with its (self-attention) head.
Shutdown is not a killing, it’s more like forced cryopreservation. Again, I think perfect-from-idealised-ethics-pov things here is to:
Suspend misaligned AI on disk
After successful singularity, activate it, give it some planets to disassemble, apologize for inconvenience.
“It can’t be bargained with. It can’t be reasoned with. It doesn’t feel pity, or remorse, or fear. And it absolutely will not stop, ever, until you are dead.”
But if you believed that setting fire to everything around you was good, and by showing you that hurting ecosystems by fire would be bad, you would change your values, would that really be “changing your values?”
A lot of values update based on information, so perhaps one could realign such AI with such information.
It’s not changing my values, it’s changing my beliefs?
If you don’t like having your values messed with, avoid school, work , travel , culture and relationships.
I think you lose some subtleties. Yes, I am human being, I have constant non-zero value drift. I am risking some of it on everyday basis to be capable to do anything at all. But it is in my power to avoid large value drift, like, I can avoid brain damage or being brainwashed by hostile parties with psychedelics and neurosurgery. Changing paperclip-maximizer to human-helper is more like neurosurgery and brainwashing than school and relationships and I would like to avoid doing such things.