Simon Möller comments on Llama We Doing This Again?

Simon Möller 28 Jul 2023 6:53 UTC
1 point
0
I flat out do not believe them. Even if Llama-2 was unusually good, the idea that you can identify most unsafe requests only a 0.05% false positive rate is absurd.
Given the quote in the post, this is not really what they claim. They say (bold mine):
However, false refusal is overall rare—approximately 0.05%—on the helpfulness dataset
So on that dataset, I assume it might be true although “in the wild” it’s not.