jefftk comments on The Waluigi Effect (mega-post)

jefftk 6 Mar 2023 21:20 UTC
21 points
12
The model in this post is that in picking out Luigi from the sea of possible simulacra you’ve also gone most of the way to picking out Waluigi. This seems testable: do we see more Waluigi-like behavior from RHLF-trained GPT than from raw GPT?
- lemonhope 13 Apr 2023 8:12 UTC
  3 points
  0
  Parent
  Yeah would love to see experiments/evidence outside of Bing