Neel Nanda comments on [Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations