Neel Nanda comments on Auditing language models for hidden objectives