gw comments on [Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations