Hear Hear! Beautifully written! Safety is a systems issue and if components at any level have vulnerabilities, they can spread to other parts of the system. As you write, we already have effective techniques for formal verification at many of the lower levels: software, OS, chips, etc. And, as you say, the AI level and social levels are more challenging. We don’t yet have a detailed understanding of transformer-based controllers and methods like RLHF diminish but don’t eliminate harmful behaviors. But if we can choose precise coarse safety criteria, then formal gatekeepers between the AI and the actuators can ensure those constraints are obeyed.
At the social level, you mention legality and violation of laws. Those work well for humans because humans care about being fined or jailed. I think the AI situation is more challenging because it may be unclear what the source of an AI involved in a harmful event was. If a malicious human actor or AI can spin off harmful AI agents anonymously, then after-the-fact punishment doesn’t serve as a deterrent. That suggests that we need to both create accountability for the provenance of AIs and to harden infrastructure so that even malicious AIs can’t cause harmful outcomes. I believe that both of those can be accomplished using formally verified techniques like Max’s and my “provable contracts”.
But there is clearly a lot of work to be done! What are the next steps and how long do we have? I’ve been waiting for LLMs to reach proficiency at theorem proving, verified software synthesis, and autoformalization. Companies like DeepSeek (https://github.com/deepseek-ai/DeepSeek-Prover-V1.5 ), Harmonic (https://harmonic.fun/news ) and xAI (https://x.ai/blog ) are making rapid progress on each of these capabilities. But OpenAI’s release of o1 this week may be the enabling event I’ve been waiting for. For example, check out mathematician Terence Tao’s experience with it: https://mathstodon.xyz/@tao/113132502735585408 “I could imagine a model of this capability that was specifically finetuned on Lean and Mathlib, and integrated into an IDE, being extremely useful in formalization projects.”
Of course, those new abilities also enable new capability levels for cyberattack and social manipulation. So the urgency for hardening today’s infrastructure may happen earlier than many of us expected. That suggests that we may need to rapidly institute coarse protective measures before we can build out the truly human-beneficial infrastructure we ultimately want. But the sooner we can build precise models of our systems, their failure modes, and the full range of adversarial attacks, the less harm humanity will have to suffer.
Hear Hear! Beautifully written! Safety is a systems issue and if components at any level have vulnerabilities, they can spread to other parts of the system. As you write, we already have effective techniques for formal verification at many of the lower levels: software, OS, chips, etc. And, as you say, the AI level and social levels are more challenging. We don’t yet have a detailed understanding of transformer-based controllers and methods like RLHF diminish but don’t eliminate harmful behaviors. But if we can choose precise coarse safety criteria, then formal gatekeepers between the AI and the actuators can ensure those constraints are obeyed.
At the social level, you mention legality and violation of laws. Those work well for humans because humans care about being fined or jailed. I think the AI situation is more challenging because it may be unclear what the source of an AI involved in a harmful event was. If a malicious human actor or AI can spin off harmful AI agents anonymously, then after-the-fact punishment doesn’t serve as a deterrent. That suggests that we need to both create accountability for the provenance of AIs and to harden infrastructure so that even malicious AIs can’t cause harmful outcomes. I believe that both of those can be accomplished using formally verified techniques like Max’s and my “provable contracts”.
But there is clearly a lot of work to be done! What are the next steps and how long do we have? I’ve been waiting for LLMs to reach proficiency at theorem proving, verified software synthesis, and autoformalization. Companies like DeepSeek (https://github.com/deepseek-ai/DeepSeek-Prover-V1.5 ), Harmonic (https://harmonic.fun/news ) and xAI (https://x.ai/blog ) are making rapid progress on each of these capabilities. But OpenAI’s release of o1 this week may be the enabling event I’ve been waiting for. For example, check out mathematician Terence Tao’s experience with it: https://mathstodon.xyz/@tao/113132502735585408 “I could imagine a model of this capability that was specifically finetuned on Lean and Mathlib, and integrated into an IDE, being extremely useful in formalization projects.”
Of course, those new abilities also enable new capability levels for cyberattack and social manipulation. So the urgency for hardening today’s infrastructure may happen earlier than many of us expected. That suggests that we may need to rapidly institute coarse protective measures before we can build out the truly human-beneficial infrastructure we ultimately want. But the sooner we can build precise models of our systems, their failure modes, and the full range of adversarial attacks, the less harm humanity will have to suffer.