We humans in fact tend to write code for which it is very hard to tell what it does. We do so by incompetence, and by error, and it takes great deal of training and effort to try to avoid doing so.
The proof techniques work by developing a proof of relevant properties along with the program, not writing whatever code you like and then magically proving stuff about it. Proving is fundamentally different approach from running some AI with unknown properties in the box and trying to analyze it. (Forget about those C++ checkers like Valgrind, Purify, etc. they just catch common code that humans rarely write deliberately, they don’t prove the code accomplishes anything. They are only possible because C++ makes it very easy to shoot yourself in the foot in a simple way. There’s an example of their use.)
The issue with automatic proving is that you need to express “AI is sane and friendly” in a way that permits proving of that. We haven’t got a slightest clue how to do that. Even for something as simple as airplane autopilot, the proving is restricted to proving things like that the code doesn’t hang, and meets deadlines (as in, updates controls every 10th of a second or the like). We can’t prove the code will never crash a virtual plane in a flight simulator in a conditions where crash is avoidable. In fact i’m pretty sure every single autopilot can and will crash airplane in some of the conditions in which crash is avoidable. I’m not sure we can even prove non-destruction of airplane for any interesting subset of conditions such as those where crash is easily avoidable (easily being e.g. by deflecting all control surfaces by no more than 50% for example); as far as I know we can’t.
My mental model was “The AI will be written by careful and skilled humans who want to ensure various properties”—so it seemed reasonable to assume that the programmers are trying to write code that things can be proved about.
I agree that “sane and friendly” is something we don’t know how to formalize. But developers might be able to prove some useful but weaker property. For instance, that data structure X determines the AI’s credence in various claims and that debugging interface Y lets you examine that structure.
There’s a common assumption around here that the only meaningful way to deal with an AI is a black-box approach. And I’m increasingly bothered by this. If we can look inside an AI and see what it’s thinking, that ought to give us additional ability to check and enforce properties.
You don’t write some clean code that things can be proved about… you have the things you want provable about the code, then you develop code along with a proof that those things remain correct. Want to prove something not originally envisioned? Want to change definition of what you’re proving? Start over again.
So, you have that data structure which determines credence. What is credence, exactly? The AI should have a bit set here when it’s telling a lie? What’s a lie? The AI outputs a sequence of characters. It affects your actions after you read it. Whenever it is a lie is as much a fact about you as about the AI.
The black box assumption wasn’t the first. It came after many failed attempts to make AI using normal, very direct programming techniques—and after growing understanding that if we make AI, it’ll likely be some kind of self developing seed which will do god only knows what after it had read the internet (and which won’t be able to talk to you otherwise). Consider this site. A lot of stuff on this site tells how to modify your thought processes. You want the AI not to read and apply any techniques it comes across here. That’s some restricted AI for sure. Really stubborn AI.
Furthermore there is enormous data overload even for human level intelligence. Human brain has 100 trillion connections or so.
We humans in fact tend to write code for which it is very hard to tell what it does. We do so by incompetence, and by error, and it takes great deal of training and effort to try to avoid doing so.
The proof techniques work by developing a proof of relevant properties along with the program, not writing whatever code you like and then magically proving stuff about it. Proving is fundamentally different approach from running some AI with unknown properties in the box and trying to analyze it. (Forget about those C++ checkers like Valgrind, Purify, etc. they just catch common code that humans rarely write deliberately, they don’t prove the code accomplishes anything. They are only possible because C++ makes it very easy to shoot yourself in the foot in a simple way. There’s an example of their use.)
The issue with automatic proving is that you need to express “AI is sane and friendly” in a way that permits proving of that. We haven’t got a slightest clue how to do that. Even for something as simple as airplane autopilot, the proving is restricted to proving things like that the code doesn’t hang, and meets deadlines (as in, updates controls every 10th of a second or the like). We can’t prove the code will never crash a virtual plane in a flight simulator in a conditions where crash is avoidable. In fact i’m pretty sure every single autopilot can and will crash airplane in some of the conditions in which crash is avoidable. I’m not sure we can even prove non-destruction of airplane for any interesting subset of conditions such as those where crash is easily avoidable (easily being e.g. by deflecting all control surfaces by no more than 50% for example); as far as I know we can’t.
My mental model was “The AI will be written by careful and skilled humans who want to ensure various properties”—so it seemed reasonable to assume that the programmers are trying to write code that things can be proved about.
I agree that “sane and friendly” is something we don’t know how to formalize. But developers might be able to prove some useful but weaker property. For instance, that data structure X determines the AI’s credence in various claims and that debugging interface Y lets you examine that structure.
There’s a common assumption around here that the only meaningful way to deal with an AI is a black-box approach. And I’m increasingly bothered by this. If we can look inside an AI and see what it’s thinking, that ought to give us additional ability to check and enforce properties.
You don’t write some clean code that things can be proved about… you have the things you want provable about the code, then you develop code along with a proof that those things remain correct. Want to prove something not originally envisioned? Want to change definition of what you’re proving? Start over again.
So, you have that data structure which determines credence. What is credence, exactly? The AI should have a bit set here when it’s telling a lie? What’s a lie? The AI outputs a sequence of characters. It affects your actions after you read it. Whenever it is a lie is as much a fact about you as about the AI.
The black box assumption wasn’t the first. It came after many failed attempts to make AI using normal, very direct programming techniques—and after growing understanding that if we make AI, it’ll likely be some kind of self developing seed which will do god only knows what after it had read the internet (and which won’t be able to talk to you otherwise). Consider this site. A lot of stuff on this site tells how to modify your thought processes. You want the AI not to read and apply any techniques it comes across here. That’s some restricted AI for sure. Really stubborn AI.
Furthermore there is enormous data overload even for human level intelligence. Human brain has 100 trillion connections or so.