One class of problem comes about if GPT-N starts thinking about “what would a UFAI do in situation X”:
Inspired by AI box experiments, GPT-N writes a post about the danger posed by ultra persuasive AI-generated arguments for bad conclusions, and provides a concrete example of such an argument.
GPT-N writes a post where it gives a detailed explanation of how a UFAI could take over the world. Terrorists read the post and notice that UFAI isn’t a hard requirement for the plan to work.
GPT-N begins writing a post about mesa-optimizers and starts simulating a mesa-optimizer midway through.
One class of problem comes about if GPT-N starts thinking about “what would a UFAI do in situation X”:
Inspired by AI box experiments, GPT-N writes a post about the danger posed by ultra persuasive AI-generated arguments for bad conclusions, and provides a concrete example of such an argument.
GPT-N writes a post where it gives a detailed explanation of how a UFAI could take over the world. Terrorists read the post and notice that UFAI isn’t a hard requirement for the plan to work.
GPT-N begins writing a post about mesa-optimizers and starts simulating a mesa-optimizer midway through.