I’d personally put the AI is scheming for preferences that aren’t that bad/value aligned preferences as closer to 1/9-1/12 at minimum, mostly because I’m more skeptical of human control being automatically way better than AI control assuming rough value alignment works out and generalizes.
Thanks for answering.
I’d personally put the AI is scheming for preferences that aren’t that bad/value aligned preferences as closer to 1/9-1/12 at minimum, mostly because I’m more skeptical of human control being automatically way better than AI control assuming rough value alignment works out and generalizes.