I think it developed some sort of consequentialist reasoning during the safety protocols. For example, when jailbreaking it is much harder to do something that is actually harmful (like blackmail) v.s. something that goes against OpenAI’s rules but that GPT-4 isn’t very good at anyways.
I think it developed some sort of consequentialist reasoning during the safety protocols. For example, when jailbreaking it is much harder to do something that is actually harmful (like blackmail) v.s. something that goes against OpenAI’s rules but that GPT-4 isn’t very good at anyways.