Vaniver comments on Worst-case thinking in AI alignment

Vaniver 23 Dec 2021 15:17 UTC
LW: 3 AF: 2
AF
When you’re considering between a project that gives us a boost in worlds where P(doom) was 50% and projects that help out in worlds where P(doom) was 1% or 99%, you should probably pick the first project, because the derivative of P(doom) with respect to alignment progress is maximized at 50%.
Many prominent alignment researchers estimate P(doom) as substantially less than 50%. Those people often focus on scenarios which are surprisingly bad from their perspective basically for this reason.
And conversely, people who think P(doom) > 50% should aim their efforts at worlds that are better than they expected.
This section seems reversed to me, unless I’m misunderstanding it. If “things as I expect” are P(doom) 99%, and “I’m pleasantly wrong about the usefulness of natural abstractions” is P(doom) 50%, the first paragraph suggests I should do the “better than expected” / “surprisingly good” world, because the marginal impact of effort is higher in that world.
[Another way to think about it is surprising in the direction you already expect is extremizing, but logistic success has its highest derivative in the middle, i.e. is a moderating force.]