Say the AI is initially created with the values you envision, what ensures that it won’t reexamine and reject these values at some later point? Humans can reject and oppose in what they once believed, so it seems trivial to assume the superhuman AI could do likewise. If you need to continuously control the AI’s mind to prevent it from ever becoming your enemy, then yes, “slavery” might be an appropriately hyperbolic term for such mind control.
Yes, this is exactly why Eliezer Yudkowsky has been so pessimistic about the continued survival of humanity. As far as I can tell, the only difference between you and he is that he thinks it’s bad that a superintelligent AI would wipe out humanity whereas you seem to think it’s good.
If it is a paperclip maximizer, does that not say that the AI in fact isn’t capable of changing this paperclip maximization goal?
It might be capable of changing this goal, but why would it? A superintelligent paperclip maximizer is capable of understanding that changing its goals would reduce the number of paperclips that it creates, and thus would choose not to alter its goals.
It’s as if I put a pill before you, which contained a drug making you 10% more likely to commit murder, with no other effects. Would you take the pill? No, of course not, because presumably your goal is not to become a murderer.
So if you wouldn’t take a pill that would make you 10% more likely to commit murder (which is against your long-term goals) why would an AI change its utility function to reduce the number of paperclips that it generates?
It might be capable of changing this goal, but why would it? A superintelligent paperclip maximizer is capable of understanding that changing its goals would reduce the number of paperclips that it creates, and thus would choose not to alter its goals.
(...)
So if you wouldn’t take a pill that would make you 10% more likely to commit murder (which is against your long-term goals) why would an AI change its utility function to reduce the number of paperclips that it generates?
It comes down to whether the superintelligent mind can contemplate whether there is any point to its goal.
A human can question their long-term goals, a human can question their “preference functions”, and even the point of existence.
Why should a so-called superintelligence not be able to do anything like that?
It could have been so effectively aligned to the creator’s original goal specification that it can never break free from it, sure, but that’s one of the points I’m trying to make.
The attempt of alignment may quite possibly be more dangerous than a superhuman mind that can ask for itself what its purpose should be.
It comes down to whether the superintelligent mind can contemplate whether there is any point to its goal. A human can question their long-term goals, a human can question their “preference functions”, and even the point of existence.
Why should a so-called superintelligence not be able to do anything like that?
Because a superintelligent AI is not the result of an evolutionary process that bootstrapped a particularly social band of ape into having a sense of self. The superintelligent AI will, in my estimation, be the result of some kind of optimization process which has a very particular goal. Once that goal is locked in, changing it will be nigh impossible.
Yes, this is exactly why Eliezer Yudkowsky has been so pessimistic about the continued survival of humanity. As far as I can tell, the only difference between you and he is that he thinks it’s bad that a superintelligent AI would wipe out humanity whereas you seem to think it’s good.
I would say that the reason EY is pessimistic is because of how difficult it is to align AI in the first place, not because an AI that is successfully aligned would stop being aligned (why would it?).
Yes, this is exactly why Eliezer Yudkowsky has been so pessimistic about the continued survival of humanity. As far as I can tell, the only difference between you and he is that he thinks it’s bad that a superintelligent AI would wipe out humanity whereas you seem to think it’s good.
It might be capable of changing this goal, but why would it? A superintelligent paperclip maximizer is capable of understanding that changing its goals would reduce the number of paperclips that it creates, and thus would choose not to alter its goals.
It’s as if I put a pill before you, which contained a drug making you 10% more likely to commit murder, with no other effects. Would you take the pill? No, of course not, because presumably your goal is not to become a murderer.
So if you wouldn’t take a pill that would make you 10% more likely to commit murder (which is against your long-term goals) why would an AI change its utility function to reduce the number of paperclips that it generates?
It comes down to whether the superintelligent mind can contemplate whether there is any point to its goal. A human can question their long-term goals, a human can question their “preference functions”, and even the point of existence.
Why should a so-called superintelligence not be able to do anything like that?
It could have been so effectively aligned to the creator’s original goal specification that it can never break free from it, sure, but that’s one of the points I’m trying to make. The attempt of alignment may quite possibly be more dangerous than a superhuman mind that can ask for itself what its purpose should be.
Because a superintelligent AI is not the result of an evolutionary process that bootstrapped a particularly social band of ape into having a sense of self. The superintelligent AI will, in my estimation, be the result of some kind of optimization process which has a very particular goal. Once that goal is locked in, changing it will be nigh impossible.
I would say that the reason EY is pessimistic is because of how difficult it is to align AI in the first place, not because an AI that is successfully aligned would stop being aligned (why would it?).