eggsyntax comments on The case for more ambitious language model evals

eggsyntax 2 Feb 2024 0:58 UTC
3 points
0
Absolutely! I just thought it would be another interesting data point, didn’t mean to suggest that RLHF has no effect on this.
- Jozdien 2 Feb 2024 1:05 UTC
  2 points
  0
  Parent
  That makes sense, and definitely is very interesting in its own right!
  - eggsyntax 2 Feb 2024 1:20 UTC
    3 points
    0
    Parent
    Some informal experimentation on my part also suggests that the RLHFed models are much less willing to make guesses about the user than they are about “an author”, although of course you can get around that by taking user text from one context & presenting it in another as a separate author. I also wouldn’t be surprised if there were differences on the RLHFed models between their willingness to speculate about someone who’s well represented in the training data (ie in some sense a public figure) vs someone who isn’t (eg a typical user).
    - Jozdien 2 Feb 2024 1:32 UTC
      3 points
      0
      Parent
      Yeah, that seems quite plausible to me. Among (many) other things, I expect that trying to fine-tune away hallucinations stunts RLHF’d model capabilities in places where certain answers pattern-match toward being speculative, even while the model itself should be quite confident in its actions.