gw comments on [Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations

gw 13 Jun 2024 11:43 UTC
2 points
0
Your GitHub link is broken, it includes the period in the url.
- Ollie J 13 Jun 2024 12:15 UTC
  2 points
  0
  Parent
  Fixed, thanks for flagging