The problem with their chosen domain mostly boils down to them either misestimating how hard quantifying all possible higher order behaviors the program doesn’t have, or they somehow have a solution and aren’t telling us that.
I like this comment as an articulation of the problem, and also some points about what formal proof systems can and can’t do:
If they knew of a path to being able to quantify all possible higher order behaviors in the proof system, I’d be more optimistic that their example would actually work IRL, but I don’t think this will happen, and be far more optimistic on software and hardware security overall.
I also like some of the hardware security discussion here, as this might well be used with formal proofs to make things even more secure and encrypted. (though formal proofs aren’t involved):
I agree that physical attacks means that it’s probably not possible to get formal proofs alone to state-level security, but I do think that it might allow them to jump up several levels in security from non-state actors, from being essentially able to control the AI through exfiltration to being unable to penetrate a code-base at all, at least until the world is entirely transformed.
I am of course assuming heavy use of AI labor here.
The problem with their chosen domain mostly boils down to them either misestimating how hard quantifying all possible higher order behaviors the program doesn’t have, or they somehow have a solution and aren’t telling us that.
I like this comment as an articulation of the problem, and also some points about what formal proof systems can and can’t do:
https://www.lesswrong.com/posts/B2bg677TaS4cmDPzL/limitations-on-formal-verification-for-ai-safety#kPRnieFrEEifZjksa
If they knew of a path to being able to quantify all possible higher order behaviors in the proof system, I’d be more optimistic that their example would actually work IRL, but I don’t think this will happen, and be far more optimistic on software and hardware security overall.
I also like some of the hardware security discussion here, as this might well be used with formal proofs to make things even more secure and encrypted. (though formal proofs aren’t involved):
https://www.lesswrong.com/posts/nP5FFYFjtY8LgWymt/#e5uSAJYmbcgpa9sAv
https://www.lesswrong.com/posts/nP5FFYFjtY8LgWymt/#TFmNy5MfkrvKTZGiA
https://www.lesswrong.com/posts/nP5FFYFjtY8LgWymt/#3Jnpurgrdip7rrK8v
I agree that physical attacks means that it’s probably not possible to get formal proofs alone to state-level security, but I do think that it might allow them to jump up several levels in security from non-state actors, from being essentially able to control the AI through exfiltration to being unable to penetrate a code-base at all, at least until the world is entirely transformed.
I am of course assuming heavy use of AI labor here.