Olli Järviniemi comments on Alignment Faking in Large Language Models

Olli Järviniemi 19 Dec 2024 15:34 UTC
13 points
14
Just want to say: This is among the best AI safety work I’ve seen, and am happy you did it!