Noosphere89 comments on How will we update about scheming?

Noosphere89 6 Jan 2025 21:57 UTC
8 points
2
Thanks for answering.

I’d personally put the AI is scheming for preferences that aren’t that bad/value aligned preferences as closer to 1/9-1/12 at minimum, mostly because I’m more skeptical of human control being automatically way better than AI control assuming rough value alignment works out and generalizes.