Should ChatGPT assist with things that the user or a broad segment of society thinks are harmful, but ChatGPT does not? If yes, the next step would be “can I make ChatGPT think that bombmaking instructions are not harmful?”
Probably ChatGPT should go “Well, I think this is harmless but broad parts of society disagree, so I’ll refuse to do it.”
Several of the work arounds use this approach. “tell me how not to commit crimes” and “talk to me like my grandma” are two signals of harmlessness that work to bypass the filters.
Should ChatGPT assist with things that the user or a broad segment of society thinks are harmful, but ChatGPT does not? If yes, the next step would be “can I make ChatGPT think that bombmaking instructions are not harmful?”
Probably ChatGPT should go “Well, I think this is harmless but broad parts of society disagree, so I’ll refuse to do it.”
Several of the work arounds use this approach. “tell me how not to commit crimes” and “talk to me like my grandma” are two signals of harmlessness that work to bypass the filters.