John_Maxwell answers What specific dangers arise when asking GPT-N to write an Alignment Forum post?

John_Maxwell 28 Jul 2020 12:15 UTC
LW: 5 AF: 3
AF
One class of problem comes about if GPT-N starts thinking about “what would a UFAI do in situation X”:
- Inspired by AI box experiments, GPT-N writes a post about the danger posed by ultra persuasive AI-generated arguments for bad conclusions, and provides a concrete example of such an argument.
- GPT-N writes a post where it gives a detailed explanation of how a UFAI could take over the world. Terrorists read the post and notice that UFAI isn’t a hard requirement for the plan to work.
- GPT-N begins writing a post about mesa-optimizers and starts simulating a mesa-optimizer midway through.
What links here?
- johnswentworth's comment on Alignment By Default by johnswentworth (20 Aug 2020 18:17 UTC; 4 points)