LawrenceC comments on Shh, don’t tell the AI it’s likely to be evil

LawrenceC 7 Dec 2022 17:42 UTC
2 points
0
Good thing there’s not a huge public forum with thousands of posts about misaligned AI that clearly has already been included in GPT-3′s training, including hundreds which argue that misaligned AI will trivially kill-

… oh wait.
All joking aside, if this does become an issue, it should be relatively easy to filter out the vast majority of “seemingly aligned AIs misbehaves” examples using a significantly smaller LM. Ditto for other things you might not want, e.g. “significant discussion of instrumental convergence”, “deceptive alignment basics”, etc.

My guess is this isn’t that big of a deal, but if it does become a big deal, we can do a lot better than just asking people to stop writing dystopian AI fiction.