Eliezer, this is the source of the objection. I have free will, i.e. I can consider two possible courses of action. I could kill myself, or I could go on with life. Until I make up my mind, I don’t know which one I will choose. Of course, I have already decided to go on with life, so I know. But if I hadn’t decided yet, I wouldn’t know.
In the same way, an AI, before making its decision, does not know whether it will turn the universe into paperclips, or into a nice place for human beings. But the AI is superintelligent: so if it does not know which one it will do, neither do we know. So we don’t know that it won’t turn the universe into paperclips.
It seems to me that this argument is valid: you will not be able to come up with what you are looking for, namely a mathematical demonstration that your AI will not turn the universe into paperclips. But it may be easy enough to show that it is unlikely, just as it is unlikely that I will kill myself.
Eliezer, this is the source of the objection. I have free will, i.e. I can consider two possible courses of action. I could kill myself, or I could go on with life. Until I make up my mind, I don’t know which one I will choose. Of course, I have already decided to go on with life, so I know. But if I hadn’t decided yet, I wouldn’t know.
In the same way, an AI, before making its decision, does not know whether it will turn the universe into paperclips, or into a nice place for human beings. But the AI is superintelligent: so if it does not know which one it will do, neither do we know. So we don’t know that it won’t turn the universe into paperclips.
It seems to me that this argument is valid: you will not be able to come up with what you are looking for, namely a mathematical demonstration that your AI will not turn the universe into paperclips. But it may be easy enough to show that it is unlikely, just as it is unlikely that I will kill myself.