(If the point is not to allow the AI to output anything misaligned, being conservative is probably the point, and lowering performance seems to be more than acceptable.)
Yes, but OpenAI could have just done that by adjusting their classification threshold.
Isn’t that only the case if their filter was the same but weaker?
(If the point is not to allow the AI to output anything misaligned, being conservative is probably the point, and lowering performance seems to be more than acceptable.)
Yes, but OpenAI could have just done that by adjusting their classification threshold.
Isn’t that only the case if their filter was the same but weaker?