Danielle Ensign comments on gwern’s Shortform

Danielle Ensign 16 Mar 2025 6:35 UTC
1 point
0
Seems like this could be addressed by filtering out comments that use evidence or personal examples from your dataset.

If that’s too intense, filtering responses to remove personal examples and checking sources shouldn’t be too bad? But maybe you’d just end up with a model that tries to subvert the filter/draw misleading conclusions from sources instead of actually being helpful…
- gwern 16 Mar 2025 21:03 UTC
  3 points
  0
  Parent
  A hack like that would just have other EDT failure modes: instead of confabulating evidence from my dataset or personal examples, it might just confabulate references. “Yes, this was predicted by Foo et al 1990, and makes perfect sense.”