Bogdan Ionut Cirstea comments on johnswentworth’s Shortform

Bogdan Ionut Cirstea 2 Nov 2024 9:08 UTC
5 points
3
how recent reports of OpenAI’s o1 being deceptive have been questioned.
This seems to be confusing a dangerous capability eval (of being able to ‘deceive’ in a visible scratchpad) with an assessment of alignment, which seems like exactly what the ‘questioning’ was about.