I think in a lot of people’s models, “10% chance of alignment by default” means “if you make a bunch of AIs, 10% chance that all of them are aligned, 90% chance that none of them are aligned”, not “if you make a bunch of AIs, 10% of them will be aligned and 90% of them won’t be”.
And the 10% estimate just represents our ignorance about the true nature of reality; it’s already true either that alignment happens by default or that it doesn’t, we just don’t know yet.
I think in a lot of people’s models, “10% chance of alignment by default” means “if you make a bunch of AIs, 10% chance that all of them are aligned, 90% chance that none of them are aligned”, not “if you make a bunch of AIs, 10% of them will be aligned and 90% of them won’t be”.
And the 10% estimate just represents our ignorance about the true nature of reality; it’s already true either that alignment happens by default or that it doesn’t, we just don’t know yet.