Quintin Pope comments on If influence functions are not approximating leave-one-out, how are they supposed to help?

Quintin Pope 23 Sep 2023 3:08 UTC
6 points
2
Re empirical evidence for influence functions:
Didn’t the Anthropic influence functions work pick up on LLMs not generalising across lexical ordering? E.g., training on “A is B” doesn’t raise the model’s credence in “Bs include A”?
Which is apparently true: https://x.com/owainevans_uk/status/1705285631520407821?s=46
- Fabien Roger 25 Sep 2023 14:11 UTC
  2 points
  0
  Parent
  That’s an exciting experimental confirmation! I’m looking forward for more predictions like those. (I’ll edit the post to add it, as well as future external validation results.)