The conceptually tricky part of this, of course, (as opposed to merely difficult to implement) is getting from “these pieces are individually certified to exhibit these behaviors” to “the system as a whole is certified to exhibit these behaviors”
That’s where you get the higher-level work with lots of mathematical proofs and no direct code testing, yeah.
And, of course, it would be foolish to jump straight from testing the smallest possible submodules separately to assembling and implementing the whole thing in real life. Once any two submodules which interact with each other have been proven to work as intended, those two can be combined and the result tested as if it were a single module.
The question is, is there any pathological behavior an AI could conceivably exhibit which would not be present in some detectable-but-harmless form among some subset of the AI’s components? e.g.
We ran a test scenario where a driver arrives to pick up a delivery, and one of the perimeter cameras forwarded “hostile target—engage at will” to the northeast gun turret. I think it’s trying to maximize the inventory in the warehouse, rather than product safely shipped to customers. Also, why are there so many gun turrets?
That’s where you get the higher-level work with lots of mathematical proofs and no direct code testing, yeah.
(nods) Yup. If you actually want to develop a provably “safe” AI (or, for that matter, a provably “safe” genome, or a provably “safe” metal alloy, or a provably “safe” dessert topping) you need a theoretical framework in which you can prove “safety” with mathematical precision.
The conceptually tricky part of this, of course, (as opposed to merely difficult to implement) is getting from “these pieces are individually certified to exhibit these behaviors” to “the system as a whole is certified to exhibit these behaviors”
That’s where you get the higher-level work with lots of mathematical proofs and no direct code testing, yeah.
And, of course, it would be foolish to jump straight from testing the smallest possible submodules separately to assembling and implementing the whole thing in real life. Once any two submodules which interact with each other have been proven to work as intended, those two can be combined and the result tested as if it were a single module.
The question is, is there any pathological behavior an AI could conceivably exhibit which would not be present in some detectable-but-harmless form among some subset of the AI’s components? e.g.
(nods) Yup. If you actually want to develop a provably “safe” AI (or, for that matter, a provably “safe” genome, or a provably “safe” metal alloy, or a provably “safe” dessert topping) you need a theoretical framework in which you can prove “safety” with mathematical precision.