Jozdien comments on The case for more ambitious language model evals

Jozdien 2 Feb 2024 1:32 UTC
3 points
0
Yeah, that seems quite plausible to me. Among (many) other things, I expect that trying to fine-tune away hallucinations stunts RLHF’d model capabilities in places where certain answers pattern-match toward being speculative, even while the model itself should be quite confident in its actions.