Seth Herd comments on Agentized LLMs will change the alignment landscape

Seth Herd 11 Apr 2023 17:38 UTC
2 points
1
I think it is a general principle, but I wouldn’t trust it to do much reasoning for me. So, good point. I’m thinking specifically of the way that there’s never been a known case of someone actually securing their software system against hostile intrusion, and the way that political process and human belief formation seems disturbingly easy to fool, for individuals briefly, and to cause chaos at the public. I’m actually not going to spell out the easiest ways I see to try to destroy the world with next-gen AI. I see some decent opportunities but no really easy ones. But I’ve never really tried to think about it.

The other principle is that for a stable long term equilibrium, every single world-ending attack must be stopped. Defense has to win 100% of the time, and that’s a big asymmetry.

The post I linked on this topic does a much more thorough job.