how recent reports of OpenAI’s o1 being deceptive have been questioned.
This seems to be confusing a dangerous capability eval (of being able to ‘deceive’ in a visible scratchpad) with an assessment of alignment, which seems like exactly what the ‘questioning’ was about.
This seems to be confusing a dangerous capability eval (of being able to ‘deceive’ in a visible scratchpad) with an assessment of alignment, which seems like exactly what the ‘questioning’ was about.