One can think of this as cases where auto-interp exhibits a precision-recall trade-off. At one extreme, you can generate super broad annotations like “all English text” to capture a a lot, which would overkill; and at the other end, you can generate very specific ones like “Slurs targeting sexual orientation” which would risk mislabeling, say, racial slurs.
One can think of this as cases where auto-interp exhibits a precision-recall trade-off. At one extreme, you can generate super broad annotations like “all English text” to capture a a lot, which would overkill; and at the other end, you can generate very specific ones like “Slurs targeting sexual orientation” which would risk mislabeling, say, racial slurs.
Section 4.3 of the OpenAI SEA paper also discusses this point.