I think that it is difficult but may be possible to create superintelligent program which will provably do some formally specified thing.
But the main problem is that we can’t specify formally what is “harming human”. Or we can, but we can’t be sure that it is safe definition.
So it results in some kind of circularity: we could prove that the machine will do X, but we can’t prove that X is actually good and safe.
We may try to return the burden of prove to the machine. We must prove that it will prove that X is really good and safe. I have bad feelings about computability of this task.
That is why I generally skeptical of idea of mathematical prove of AI safety. It doesn’t provide 100 per cent safety, because prove can have holes in it and the task is too complex to be solved in time.
I think that it is difficult but may be possible to create superintelligent program which will provably do some formally specified thing.
But the main problem is that we can’t specify formally what is “harming human”. Or we can, but we can’t be sure that it is safe definition.
So it results in some kind of circularity: we could prove that the machine will do X, but we can’t prove that X is actually good and safe.
We may try to return the burden of prove to the machine. We must prove that it will prove that X is really good and safe. I have bad feelings about computability of this task.
That is why I generally skeptical of idea of mathematical prove of AI safety. It doesn’t provide 100 per cent safety, because prove can have holes in it and the task is too complex to be solved in time.
This is a real and important difficulty, but it isn’t what the paper is about—they assume one can always readily tell whether people are being harmed.