“Superintelligence is a hypothetical agent that possesses intelligence far surpassing that of the brightest and
most gifted human minds. In light of recent advances in machine intelligence, a number of scientists, philosophers
and technologists have revived the discussion about the potential catastrophic risks entailed by such an
entity. In this article, we trace the origins and development of the neo-fear of superintelligence, and some of
the major proposals for its containment. We argue that such containment is, in principle, impossible, due to
fundamental limits inherent to computing itself. Assuming that a superintelligence will contain a program that
includes all the programs that can be executed by a universal Turing machine on input potentially as complex as
the state of the world, strict containment requires simulations of such a program, something theoretically (and
practically) infeasible.”
This paper frames the problem as “look at a program and figure out whether it will be harmful” and correctly observes that there is no way to solve that problem with perfect accuracy if the programs being analysed are arbitrary. But its arguments have nothing to say about, e.g., whether there’s some way of preventing harm as it’s about to happen; nor about whether it is possible to construct a program that provably does something useful without harming humans.
E.g., imagine a world where it is known that the only way to harm humans is to press a certain big red button labelled “Harm the Humans”. The arguments in this paper show that there is no general procedure for deciding whether a computer with the ability to press this button will do so. But they don’t rule out the possibility that you can make a useful machine with no access to the button, or a useful machine with a little bit of hardware in it that blows it up if it gets too close to the button.
(There are reasons to be concerned about such machines because in practice you probably can’t causally isolate them from the button in the way required. The paper’s introductory material discusses some such reasons. But they play no role in the technical argument of the paper, at least on the cursory reading I’ve given it.)
I think that it is difficult but may be possible to create superintelligent program which will provably do some formally specified thing.
But the main problem is that we can’t specify formally what is “harming human”. Or we can, but we can’t be sure that it is safe definition.
So it results in some kind of circularity: we could prove that the machine will do X, but we can’t prove that X is actually good and safe.
We may try to return the burden of prove to the machine. We must prove that it will prove that X is really good and safe. I have bad feelings about computability of this task.
That is why I generally skeptical of idea of mathematical prove of AI safety. It doesn’t provide 100 per cent safety, because prove can have holes in it and the task is too complex to be solved in time.
Assuming that a superintelligence will contain a program that includes all the programs that can be executed by a universal Turing machine on input potentially as complex as the state of the world
What is the notion of “includes” here? Edit: from pp 4-5:
This means that a superintelligent machine could simulate the behavior of an arbitrary Turing machine on arbitrary input, and hence for our purpose the superintelligent machine is a (possibly identical) super-set of the Turing machines. Indeed, quoting Turing, “a man provided with paper, pencil, and rubber, and subject to strict discipline, is in effect a universal machine”
“Superintelligence cannot be contained: Lessons from Computability Theory” http://arxiv.org/pdf/1607.00913.pdf
“Superintelligence is a hypothetical agent that possesses intelligence far surpassing that of the brightest and most gifted human minds. In light of recent advances in machine intelligence, a number of scientists, philosophers and technologists have revived the discussion about the potential catastrophic risks entailed by such an entity. In this article, we trace the origins and development of the neo-fear of superintelligence, and some of the major proposals for its containment. We argue that such containment is, in principle, impossible, due to fundamental limits inherent to computing itself. Assuming that a superintelligence will contain a program that includes all the programs that can be executed by a universal Turing machine on input potentially as complex as the state of the world, strict containment requires simulations of such a program, something theoretically (and practically) infeasible.”
This paper frames the problem as “look at a program and figure out whether it will be harmful” and correctly observes that there is no way to solve that problem with perfect accuracy if the programs being analysed are arbitrary. But its arguments have nothing to say about, e.g., whether there’s some way of preventing harm as it’s about to happen; nor about whether it is possible to construct a program that provably does something useful without harming humans.
E.g., imagine a world where it is known that the only way to harm humans is to press a certain big red button labelled “Harm the Humans”. The arguments in this paper show that there is no general procedure for deciding whether a computer with the ability to press this button will do so. But they don’t rule out the possibility that you can make a useful machine with no access to the button, or a useful machine with a little bit of hardware in it that blows it up if it gets too close to the button.
(There are reasons to be concerned about such machines because in practice you probably can’t causally isolate them from the button in the way required. The paper’s introductory material discusses some such reasons. But they play no role in the technical argument of the paper, at least on the cursory reading I’ve given it.)
I think that it is difficult but may be possible to create superintelligent program which will provably do some formally specified thing.
But the main problem is that we can’t specify formally what is “harming human”. Or we can, but we can’t be sure that it is safe definition.
So it results in some kind of circularity: we could prove that the machine will do X, but we can’t prove that X is actually good and safe.
We may try to return the burden of prove to the machine. We must prove that it will prove that X is really good and safe. I have bad feelings about computability of this task.
That is why I generally skeptical of idea of mathematical prove of AI safety. It doesn’t provide 100 per cent safety, because prove can have holes in it and the task is too complex to be solved in time.
This is a real and important difficulty, but it isn’t what the paper is about—they assume one can always readily tell whether people are being harmed.
What is the notion of “includes” here? Edit: from pp 4-5: