Why is the built-in assumption for almost every single post on this site that alignment is impossible and we need a 100 year international ban to survive? This does not seem particularly intellectually honest to me. It is very possible no international agreement is needed. Alignment may turn out to be quite tractable.
A mere 5% chance that the plane will crash during your flight is consistent with considering this extremely concerning and doing anything in your power to avoid getting on it. “Alignment is impossible” is not necessary for great concern, isn’t implied by great concern.
I don’t think this line of argument is a good one. If there’s a 5% chance of x-risk and, say, a 50% chance that AGI makes the world just generally be very chaotic and high-stakes over the next few decades, then it seems very plausible that you should mostly be optimizing for making the 50% go well rather than the 5%.
Still consistent with great concern. I’m pointing out that O O’s point isn’t locally valid, observing concern shouldn’t translate into observing belief that alignment is impossible.
Yudkowsky has a pinned tweet that states the problem quite well: it’s not so much that alignment is necessarily infinitely difficult, but that it certainly doesn’t seem anywhere as easy as advancing capabilities, and that’s a problem when what matters is whether the first powerful AI is aligned:
Safely aligning a powerful AI will be said to be ‘difficult’ if that work takes two years longer or 50% more serial time, whichever is less, compared to the work of building a powerful AI without trying to safely align it.
Another frame: If alignment turns out to be easy, then the default trajectory seems fine (at least from an alignment POV. You might still be worried about EG concentration of power).
If alignment turns out to be hard, then the policy decisions we make to affect the default trajectory matter a lot more.
This means that even if misalignment risks are relatively low, a lot of value still comes from thinking about worlds where misalignment is hard (or perhaps “somewhat hard but not intractably hard”).
It’s not every post, but there are still a lot of people who think that alignment is very hard.
The more common assumption is that we should assume that alignment isn’t trivial, because an intellectually honest assessment of the range of opinions suggests that we collectively do not yet know how hard alignment will be.
Why is the built-in assumption for almost every single post on this site that alignment is impossible and we need a 100 year international ban to survive? This does not seem particularly intellectually honest to me. It is very possible no international agreement is needed. Alignment may turn out to be quite tractable.
A mere 5% chance that the plane will crash during your flight is consistent with considering this extremely concerning and doing anything in your power to avoid getting on it. “Alignment is impossible” is not necessary for great concern, isn’t implied by great concern.
I don’t think this line of argument is a good one. If there’s a 5% chance of x-risk and, say, a 50% chance that AGI makes the world just generally be very chaotic and high-stakes over the next few decades, then it seems very plausible that you should mostly be optimizing for making the 50% go well rather than the 5%.
Still consistent with great concern. I’m pointing out that O O’s point isn’t locally valid, observing concern shouldn’t translate into observing belief that alignment is impossible.
Yudkowsky has a pinned tweet that states the problem quite well: it’s not so much that alignment is necessarily infinitely difficult, but that it certainly doesn’t seem anywhere as easy as advancing capabilities, and that’s a problem when what matters is whether the first powerful AI is aligned:
Another frame: If alignment turns out to be easy, then the default trajectory seems fine (at least from an alignment POV. You might still be worried about EG concentration of power).
If alignment turns out to be hard, then the policy decisions we make to affect the default trajectory matter a lot more.
This means that even if misalignment risks are relatively low, a lot of value still comes from thinking about worlds where misalignment is hard (or perhaps “somewhat hard but not intractably hard”).
It’s not every post, but there are still a lot of people who think that alignment is very hard.
The more common assumption is that we should assume that alignment isn’t trivial, because an intellectually honest assessment of the range of opinions suggests that we collectively do not yet know how hard alignment will be.