FWIW, the idea that we’ll be able to “prove” very much of interest about the long-term consequences of the behaviour of superintelligent machines on humanity strikes me as wrong-headed.
The idea is to prove properties of actions, not consequences. From Knowability of FAI:
You can try to prove a theorem along the lines of: “Providing that the transistors in this computer chip behave the way they’re supposed to, the AI that runs on this chip will always try to be Friendly.” You’re going to prove a statement about the search the AI carries out to find its actions. Metaphorically speaking, you’re going to prove that the AI will always, to the best of its knowledge, seek to move little old ladies to the other side of the street and avoid the deaths of nuns. To prove this formally, you would have to precisely define “try to be Friendly”: the complete criterion that the AI uses to choose among its actions—including how the AI learns a model of reality from experience, and how the AI identifies the goal-valent aspects of the reality it learns to model.
Once you’ve formulated this precise definition, you still can’t prove an absolute certainty that the AI will be Friendly in the real world, because a series of cosmic rays could still hit all of the transistors at exactly the wrong time to overwrite the entire program with an evil AI. Or Descartes’s infinitely powerful deceiving demon could have fooled you into thinking that there was a computer in front of you, when in fact it’s a hydrogen bomb. Or the Dark Lords of the Matrix could reach into the computer simulation that is our world, and replace the AI with Cthulhu. What you can prove with mathematical certitude is that if all the transistors in the chip work correctly, the AI “will always try to be Friendly”—after you’ve given “try to be Friendly” a precise definition in terms of how the AI learns a model of the world, identifies the important things in it, and chooses between actions, these all being events that happen inside the computer chip.
It doesn’t make the problem any less ridiculous or intractable. I don’t see any good reason for thinking such an approach is called for. No machine intelligence project I have heard of has ever done anything remotely like this—and yet such projects produce results that are typically helpful and benign.
Proofs can get complicated—and can be wrong. Extensive unit testing is a more practical and realistic approach for most programming problems. The virtues of such testing have become widely appreciated in recent years.
The idea is to prove properties of actions, not consequences. From Knowability of FAI:
It doesn’t make the problem any less ridiculous or intractable. I don’t see any good reason for thinking such an approach is called for. No machine intelligence project I have heard of has ever done anything remotely like this—and yet such projects produce results that are typically helpful and benign.
Proofs can get complicated—and can be wrong. Extensive unit testing is a more practical and realistic approach for most programming problems. The virtues of such testing have become widely appreciated in recent years.