johnswentworth comments on Predicting Alignment Award Winners Using ChatGPT 4

johnswentworth 8 Feb 2024 17:04 UTC
4 points
1
Possibly relevant to your results: I was one of the people who judged the Alignment Award competition, and if I remember correctly the Shutdown winner (roughly this post) was head-and-shoulders better than any other submission in any category. So it’s not too surprising that GPT had a harder time predicting the Goal Misgeneralization winner; there wasn’t as clear a winner in that category.
- Shoshannah Tekofsky 8 Feb 2024 17:36 UTC
  1 point
  0
  Parent
  Oh, that does help to know, thank you!