Seth Herd comments on Limitations on Formal Verification for AI Safety

Seth Herd 20 Aug 2024 22:08 UTC
2 points
0
Okay! Thanks for the clarification. That’s what I got from your paper with Tegmark, but in the more recent writing it sounded like maybe you were extending the goal to actually verifying safe behavior from the AGI. This is what I was referring to as a potentially useful supplement to alignment. I agree that it’s possible with improved verification methods, given the caveats from this post.

An unaligned AGI could take action outside of all of our infrastructure, so protecting it would be a partial solution at best.

If I were or controlled an AGI and wanted to take over, I’d set up my own infrastructure in a hidden location, underground or off-planet, and let the magic of self-replicating manufacturing develop whatever I needed to take over. You’d need some decent robotics to jumpstart this process, but it looks like progress in robotics is speeding up alongside progress in AI.
- Steve_Omohundro 20 Aug 2024 22:30 UTC
  4 points
  2
  Parent
  I think we definitely would like a potentially unsafe AI to be able to generate control actions, code, hardware, or systems designs together with proofs that those designs meet specified goals. Our trusted systems can then cheaply and reliably check that proof and if it passes, safely use the designs or actions from an untrusted AI. I think that’s a hugely important pattern and it can be extended in all sorts of ways. For example, markets of untrusted agents can still solve problems and take actions that obey desired constraints, etc.
  The issue of unaligned AGI hiding itself is potentially huge! I have end state designs that would guarantee peace and abundance for humanity, but they require that all AIs operate under a single proven infrastructure. In the intermediate period between now and then is the highest risk, I think.
  And, of course, an adversarial AI will do everything it can to hide and garner resources! One of the great uses of provable hardware is the ability to create controlled privacy. You can have extensive networks of sensors where all parties are convinced by proofs that they won’t transmit information about what they are sensing unless a specified situation is sensed. It looks like that kind of technology might allow mutual treaties which meet all parties needs but prevent the “hidden rogue AIs” buried in the desert. I don’t understand the dynamics very well yet, though.
  What links here?
  - If we solve alignment, do we die anyway? by Seth Herd (23 Aug 2024 13:13 UTC; 81 points)