Let’s think about it another way. Consider the thought experiment where a single normal cell is removed from the body of any randomly selected human. Clearly they would still be human.
If you keep on removing normal cells though eventually they would die. And if you keep on plucking away cells eventually the entire body would be gone and only cancerous cells would be left, i.e. only a ‘paperclip optimizer’ would remain from the original human, albeit inefficient and parasitic ‘paperclips’ that need a organic host.
(Due to the fact that everyone has some small number of cancerous cells at any given time that are taken care of by regular processes)
At what point does the human stop being ‘human’ and starts being a lump of flesh? And at what point does the lump of flesh become a latent ‘paperclip optimizer’?
Without a sharp cutoff, which I don’t think there is, there will inevitably be inbetween cases where your proposed methods cannot be applied consistently.
The trouble is if we, or the decision makers of the future, accept even one idea that is not internally consistent then it hardly seems like anyone will be able to refrain from accepting other ideas that are internally contradictory too. Nor will everyone err in the same way. There is no rational basis to accept one or another as a contradiction can imply anything at all, as we know from basic logic.
Then the end result will appear quite like monkey tribes fighting each other, agitating against each and all based on which inconsistencies they accept or not. Regardless of what they call each other, humans, aliens, AI, machines, organism, etc…
It does seem like alignment for all intents and purposes is impossible. Creating an AI truly beyond us then is really creating future, hopefully doting, parents to live under.
Those appear to be examples of arguments from consequences, a logical fallacy. How could similar reasoning be derived from axioms, if at all?
Let’s think about it another way. Consider the thought experiment where a single normal cell is removed from the body of any randomly selected human. Clearly they would still be human.
If you keep on removing normal cells though eventually they would die. And if you keep on plucking away cells eventually the entire body would be gone and only cancerous cells would be left, i.e. only a ‘paperclip optimizer’ would remain from the original human, albeit inefficient and parasitic ‘paperclips’ that need a organic host.
(Due to the fact that everyone has some small number of cancerous cells at any given time that are taken care of by regular processes)
At what point does the human stop being ‘human’ and starts being a lump of flesh? And at what point does the lump of flesh become a latent ‘paperclip optimizer’?
Without a sharp cutoff, which I don’t think there is, there will inevitably be inbetween cases where your proposed methods cannot be applied consistently.
The trouble is if we, or the decision makers of the future, accept even one idea that is not internally consistent then it hardly seems like anyone will be able to refrain from accepting other ideas that are internally contradictory too. Nor will everyone err in the same way. There is no rational basis to accept one or another as a contradiction can imply anything at all, as we know from basic logic.
Then the end result will appear quite like monkey tribes fighting each other, agitating against each and all based on which inconsistencies they accept or not. Regardless of what they call each other, humans, aliens, AI, machines, organism, etc…
It does seem like alignment for all intents and purposes is impossible. Creating an AI truly beyond us then is really creating future, hopefully doting, parents to live under.