Challenge 2 – Most of the AI threats of greatest concern have too much complexity to physically model.
Setting aside for a moment the question of whether we can develop precise rules-based models of physics, GS-based approaches to safety would still need to determine how to formally model the specific AI threats of interest as well. For example, consider the problem of determining whether a given RNA or DNA sequence could cause harm to individuals or to the human species. This is a well-known area of concern in synthetic biology, where experts expect that risks, especially around the synthesis of novel viruses, will dramatically increase as more end-users gain access to powerful AI systems. This threat is specifically discussed in [3] as an area in which the authors believe that formal verification-based approaches can help:
I certainly agree that there are complex situations where we can’t currently tell what’s safe and what isn’t. In that case we should clearly be incredibly conservative and not expose humans to systems with unclear safety properties! The burden is on the creator of a system to demonstrate safety!
Regarding DNA and RNA synthesis, I see the need for provable technologies at a much simpler and lower level than what you are talking about. Right now you can buy DNA and RNA synthesis machines without any kinds of controls on what they will synthesize from many suppliers for just a few thousand dollars. In a world in which the complete sequence of the smallpox DNA and other extremely harmful pathogens, this is completely insane!
It would be nice if we could automatically tell whether a DNA sequence was harmful or not. But, as you say, we are not at the point where AI can do that. But that doesn’t mean we throw up our hands and say “Ok, go ahead and synthesize whatever you want! Here’s the smallpox DNA in case you’re interested!”
Our biohazard labs have strict rules and guidelines for what they can synthesize and how. Human committees must sign off on work that might inadvertently lead to human harm. We need at least that level of control on synthesis devices available in the open market. For example, sequences might need to be cryptographically signed by a governing body stating they are safe before they can be synthesized.
But how can we be sure that the hardware will require those digital signatures? That’s where the “provable contract” hardened crytpographic technologies that Max and I describe comes in. It needs to be impossible for adversaries to use synthesis machines to create unsigned potentially harmful sequences.
There is a whole large story about how that works. Fundamentally it involves secure digital hardware similar to Apple’s Secure Enclave. But with provably effective tamper sensing which implements “zeroization” (deletion of cryptographic keys on detection of tampering). This needs to be integrated into the core operation of the device.
Similar technology needs to be incorporated into robots and other physical devices which can cause harm. The mathematical proof guarantee is that the device will only operate under conditions specified by formal “contract rules”. Those rules can be chosen to be whatever is appropriate for the domain of interest. For biohazard DNA synthesis, it might involve a digital signature from a secure government database of safe sequences or from a regulatory committee overseeing new research.
The post’s second Challenge is:
I certainly agree that there are complex situations where we can’t currently tell what’s safe and what isn’t. In that case we should clearly be incredibly conservative and not expose humans to systems with unclear safety properties! The burden is on the creator of a system to demonstrate safety!
Regarding DNA and RNA synthesis, I see the need for provable technologies at a much simpler and lower level than what you are talking about. Right now you can buy DNA and RNA synthesis machines without any kinds of controls on what they will synthesize from many suppliers for just a few thousand dollars. In a world in which the complete sequence of the smallpox DNA and other extremely harmful pathogens, this is completely insane!
It would be nice if we could automatically tell whether a DNA sequence was harmful or not. But, as you say, we are not at the point where AI can do that. But that doesn’t mean we throw up our hands and say “Ok, go ahead and synthesize whatever you want! Here’s the smallpox DNA in case you’re interested!”
Our biohazard labs have strict rules and guidelines for what they can synthesize and how. Human committees must sign off on work that might inadvertently lead to human harm. We need at least that level of control on synthesis devices available in the open market. For example, sequences might need to be cryptographically signed by a governing body stating they are safe before they can be synthesized.
But how can we be sure that the hardware will require those digital signatures? That’s where the “provable contract” hardened crytpographic technologies that Max and I describe comes in. It needs to be impossible for adversaries to use synthesis machines to create unsigned potentially harmful sequences.
There is a whole large story about how that works. Fundamentally it involves secure digital hardware similar to Apple’s Secure Enclave. But with provably effective tamper sensing which implements “zeroization” (deletion of cryptographic keys on detection of tampering). This needs to be integrated into the core operation of the device.
Similar technology needs to be incorporated into robots and other physical devices which can cause harm. The mathematical proof guarantee is that the device will only operate under conditions specified by formal “contract rules”. Those rules can be chosen to be whatever is appropriate for the domain of interest. For biohazard DNA synthesis, it might involve a digital signature from a secure government database of safe sequences or from a regulatory committee overseeing new research.