In other words, I now believe a significant probability, on the order of 50-70%, that alignment is solved by default.
Let’s suppose that you are entirely right about deceptive alignment being unlikely. (So we’ll set aside things like “what specific arguments caused you to update?” and tricky questions about modest epistemology/outside views).
I don’t see how “alignment is solved by default with 30-50% probability justifies claims like “capabilities progress is net positive” or “AI alignment should change purpose to something else.”
If a doctor told me I had a disease that had a 50-70% chance to resolve on its own, otherwise it would kill me, I wouldn’t go “oh okay, I should stop trying to fight the disease.”
The stakes are also not symmetrical. Getting (aligned) AGI 1 year sooner is great, but it only leads to one extra year of flourishing. Getting unaligned AGI leads to a significant loss over the entire far-future.
So even if we have a 50-70% chance of alignment by default, I don’t see how your central conclusions follow.
I’ll make another version of the thought experiment, in which we can get a genetic upgrade in which it gives you +1000 utils if you have it for a 70% chance, or it gives −1000 utils at a 30% chance.
Should you take it?
The answer is yes, in expectation, and it will give you +400 utils in expectation.
This is related to a general principle: As long as the probabilities of positive outcomes are over 50% and the costs and benefits are symmetrical, it is a good thing to do that activity.
And my contention is that AGI/ASI is just a larger version of the thought experiment above. AGI/ASI is a symmetric technology wrt good and bad outcomes, so that’s why it’s okay to increase capabilities.
Let’s suppose that you are entirely right about deceptive alignment being unlikely. (So we’ll set aside things like “what specific arguments caused you to update?” and tricky questions about modest epistemology/outside views).
I don’t see how “alignment is solved by default with 30-50% probability justifies claims like “capabilities progress is net positive” or “AI alignment should change purpose to something else.”
If a doctor told me I had a disease that had a 50-70% chance to resolve on its own, otherwise it would kill me, I wouldn’t go “oh okay, I should stop trying to fight the disease.”
The stakes are also not symmetrical. Getting (aligned) AGI 1 year sooner is great, but it only leads to one extra year of flourishing. Getting unaligned AGI leads to a significant loss over the entire far-future.
So even if we have a 50-70% chance of alignment by default, I don’t see how your central conclusions follow.
I’ll make another version of the thought experiment, in which we can get a genetic upgrade in which it gives you +1000 utils if you have it for a 70% chance, or it gives −1000 utils at a 30% chance.
Should you take it?
The answer is yes, in expectation, and it will give you +400 utils in expectation.
This is related to a general principle: As long as the probabilities of positive outcomes are over 50% and the costs and benefits are symmetrical, it is a good thing to do that activity.
And my contention is that AGI/ASI is just a larger version of the thought experiment above. AGI/ASI is a symmetric technology wrt good and bad outcomes, so that’s why it’s okay to increase capabilities.