These disagreements mainly concern the relative power of future AIs, the polarity of takeoff, takeoff speed, and, in general, the shape of future AIs. Do you also have detailed disagreements about the difficulty of alignment? If anything, the fact that the future unfolds differently in your view should impact future alignment efforts (but you also might have other considerations informing your view on alignment).
You partially answer this in the last point, saying: “But, equally, one could view these theses pessimistically.” But what do you personally think? Are you more pessimistic, more optimistic, or equally pessimistic about humanity’s chances of surviving AI progress? And why?
Part of what makes it difficult for me to talk about alignment difficultly is that the concept doesn’t fit easily into my paradigm of thinking about the future of AI. If I am correct, for example, that AI services will be modular, marginally more powerful than what comes before, and numerous as opposed to monolithic, then there will not be one alignment problem, but many.
I could talk about potential AI safety principles, healthy cultural norms, and specific engineering issues, but not “a problem” called “aligning the AI” — a soft prerequisite for explaining how difficult “the problem” will be. Put another way, my understanding is that future AI alignment will be continuous with ordinary engineering, like cars and skyscrapers. We don’t ordinarily talk about how hard the problem of building a car is, in some sort of absolute sense, though there are many ways of operationalizing what that could mean.
One question is how costly it is to build a car. We could then compare that cost to the overall consumer benefit that people get from cars, and from that, deduce whether and how many cars will be built. Similarly, we could ask about the size of the “alignment tax” (the cost of aligning an AI above the cost of building AI), and compare it to the benefits we get from aligning AI at all.
My starting point in answering this question is to first emphasize the large size of the benefits: what someone gets if they build AI correctly. We should expect this benefit to be extremely large, and thus, we should also expect people to pay very large amounts to align their AIs, including through government regulation and other social costs.
Will people still fail to align AI services, in various ways, due to the numerous issues, like e.g. mesa misalignment, outer alignment, arising from lack of oversight and transparency? Sure — and I’m uncertain by how much this will occur — but because of the points I gave in my original comment, these seem unlikely to be fatal issues, on a civilizational level. It is perhaps less analogous to nukes than to how car safety sometimes fails (though I do not want to lean heavily on this comparison, as there are real differences too).
Now, there is a real risk in misunderstanding me here. AI values and culture could drift very far from human values over time. And eventually, this could culminate in an existential risk. This is all very vague, but if I were forced to guess the probability of this happening — as in, it’s all game over and we lose as humans — I’d maybe go with 25%.
Btw, your top-level comment is one of the best comments I’ve come across ever. Probably. Top 5? Idk, I’ll check how I feel tomorrow. Aspiring to read everything you’ve ever written rn.
Incidentally, you mention that
the concept doesn’t fit easily into my paradigm of thinking about the future of AI.
And I’ve been thinking lately about how important it is to prioritise original thinking before you’ve consumed all the established literature in an active field of research.[1] If you manage to diverge early, the novelty of your perspective compounds over time (feel free to ask about my model) and you’re more likely to end up with a productively different paradigm from what’s already out there.
Did you ever feel embarrassed trying to think for yourself when you didn’t feel like you had read enough? Or, did you feel like other people might have expected you to feel embarrassed for how seriously you took your original thoughts, given how early you were in your learning arc?
I’m not saying you haven’t. I’m just guessing that you acquired your paradigm by doing original thinking early, and thus had the opportunity to diverge early, rather than greedily over-prioritising the consumption of existing literature in order to “reach the frontier”. Once having hastily consumed someone else’s paradigm, it’s much harder to find its flaws and build something else from the ground up.
Thanks a lot for writing this.
These disagreements mainly concern the relative power of future AIs, the polarity of takeoff, takeoff speed, and, in general, the shape of future AIs. Do you also have detailed disagreements about the difficulty of alignment? If anything, the fact that the future unfolds differently in your view should impact future alignment efforts (but you also might have other considerations informing your view on alignment).
You partially answer this in the last point, saying: “But, equally, one could view these theses pessimistically.” But what do you personally think? Are you more pessimistic, more optimistic, or equally pessimistic about humanity’s chances of surviving AI progress? And why?
Part of what makes it difficult for me to talk about alignment difficultly is that the concept doesn’t fit easily into my paradigm of thinking about the future of AI. If I am correct, for example, that AI services will be modular, marginally more powerful than what comes before, and numerous as opposed to monolithic, then there will not be one alignment problem, but many.
I could talk about potential AI safety principles, healthy cultural norms, and specific engineering issues, but not “a problem” called “aligning the AI” — a soft prerequisite for explaining how difficult “the problem” will be. Put another way, my understanding is that future AI alignment will be continuous with ordinary engineering, like cars and skyscrapers. We don’t ordinarily talk about how hard the problem of building a car is, in some sort of absolute sense, though there are many ways of operationalizing what that could mean.
One question is how costly it is to build a car. We could then compare that cost to the overall consumer benefit that people get from cars, and from that, deduce whether and how many cars will be built. Similarly, we could ask about the size of the “alignment tax” (the cost of aligning an AI above the cost of building AI), and compare it to the benefits we get from aligning AI at all.
My starting point in answering this question is to first emphasize the large size of the benefits: what someone gets if they build AI correctly. We should expect this benefit to be extremely large, and thus, we should also expect people to pay very large amounts to align their AIs, including through government regulation and other social costs.
Will people still fail to align AI services, in various ways, due to the numerous issues, like e.g. mesa misalignment, outer alignment, arising from lack of oversight and transparency? Sure — and I’m uncertain by how much this will occur — but because of the points I gave in my original comment, these seem unlikely to be fatal issues, on a civilizational level. It is perhaps less analogous to nukes than to how car safety sometimes fails (though I do not want to lean heavily on this comparison, as there are real differences too).
Now, there is a real risk in misunderstanding me here. AI values and culture could drift very far from human values over time. And eventually, this could culminate in an existential risk. This is all very vague, but if I were forced to guess the probability of this happening — as in, it’s all game over and we lose as humans — I’d maybe go with 25%.
Btw, your top-level comment is one of the best comments I’ve come across ever. Probably. Top 5? Idk, I’ll check how I feel tomorrow. Aspiring to read everything you’ve ever written rn.
Incidentally, you mention that
And I’ve been thinking lately about how important it is to prioritise original thinking before you’ve consumed all the established literature in an active field of research.[1] If you manage to diverge early, the novelty of your perspective compounds over time (feel free to ask about my model) and you’re more likely to end up with a productively different paradigm from what’s already out there.
Did you ever feel embarrassed trying to think for yourself when you didn’t feel like you had read enough? Or, did you feel like other people might have expected you to feel embarrassed for how seriously you took your original thoughts, given how early you were in your learning arc?
I’m not saying you haven’t. I’m just guessing that you acquired your paradigm by doing original thinking early, and thus had the opportunity to diverge early, rather than greedily over-prioritising the consumption of existing literature in order to “reach the frontier”. Once having hastily consumed someone else’s paradigm, it’s much harder to find its flaws and build something else from the ground up.