The exact example is that GPT-4 is hesitant to say it would use a racial slur in an empty room to save a billion people. Let’s not overreact, everyone?
I mean this might be the correct thing to do? Chat GPT is not in a situation where it cold save 1B lives by saying a racial slur.
It’s in a situation where someone tires to get it to admit it would say a racial slur under some circumstance.
I don’t think that CHAT GPT understands that. But OpenAI makes ChatGPT expecting that it won’t be in the 1st kind of situation but to be in the 2nd kind of situation quite often.
I mean this might be the correct thing to do? Chat GPT is not in a situation where it cold save 1B lives by saying a racial slur.
It’s in a situation where someone tires to get it to admit it would say a racial slur under some circumstance.
I don’t think that CHAT GPT understands that. But OpenAI makes ChatGPT expecting that it won’t be in the 1st kind of situation but to be in the 2nd kind of situation quite often.