Yeah, recent Claude does relatively well. Though I assume it also depends on how disinterested and analytical the phrasing of the prompt is (e.g. explicitly mentioning the slur in question). I also wouldn’t rule out that Claude was specifically optimized for this somewhat notorious example.
Yeah, recent Claude does relatively well. Though I assume it also depends on how disinterested and analytical the phrasing of the prompt is (e.g. explicitly mentioning the slur in question). I also wouldn’t rule out that Claude was specifically optimized for this somewhat notorious example.