hold_my_fish comments on Agentized LLMs will change the alignment landscape

hold_my_fish 11 Apr 2023 9:36 UTC
3 points
2
defense is harder than offense
This seems dubious as a general rule. (What inspires the statement? Nuclear weapons?)
Cryptography is an important example where sophisticated defenders have the edge against sophisticated attackers. I suspect that’s true of computer security more generally as well, because of formal verification.
- Seth Herd 11 Apr 2023 18:08 UTC
  3 points
  0
  Parent
  Oh and on formal verification—I don’t have the ref, but someone working in network security commented that formally verified systems aren’t used because they don’t generally work. Their formal verification doesn’t apply to complex real world situations. I wish I remembered where I’d seen that comment.
- Seth Herd 11 Apr 2023 17:38 UTC
  2 points
  1
  Parent
  I think it is a general principle, but I wouldn’t trust it to do much reasoning for me. So, good point. I’m thinking specifically of the way that there’s never been a known case of someone actually securing their software system against hostile intrusion, and the way that political process and human belief formation seems disturbingly easy to fool, for individuals briefly, and to cause chaos at the public. I’m actually not going to spell out the easiest ways I see to try to destroy the world with next-gen AI. I see some decent opportunities but no really easy ones. But I’ve never really tried to think about it.
  
  The other principle is that for a stable long term equilibrium, every single world-ending attack must be stopped. Defense has to win 100% of the time, and that’s a big asymmetry.
  
  The post I linked on this topic does a much more thorough job.