Yudkowsky has a pinned tweet that states the problem quite well: it’s not so much that alignment is necessarily infinitely difficult, but that it certainly doesn’t seem anywhere as easy as advancing capabilities, and that’s a problem when what matters is whether the first powerful AI is aligned:
Safely aligning a powerful AI will be said to be ‘difficult’ if that work takes two years longer or 50% more serial time, whichever is less, compared to the work of building a powerful AI without trying to safely align it.
Another frame: If alignment turns out to be easy, then the default trajectory seems fine (at least from an alignment POV. You might still be worried about EG concentration of power).
If alignment turns out to be hard, then the policy decisions we make to affect the default trajectory matter a lot more.
This means that even if misalignment risks are relatively low, a lot of value still comes from thinking about worlds where misalignment is hard (or perhaps “somewhat hard but not intractably hard”).
Yudkowsky has a pinned tweet that states the problem quite well: it’s not so much that alignment is necessarily infinitely difficult, but that it certainly doesn’t seem anywhere as easy as advancing capabilities, and that’s a problem when what matters is whether the first powerful AI is aligned:
Another frame: If alignment turns out to be easy, then the default trajectory seems fine (at least from an alignment POV. You might still be worried about EG concentration of power).
If alignment turns out to be hard, then the policy decisions we make to affect the default trajectory matter a lot more.
This means that even if misalignment risks are relatively low, a lot of value still comes from thinking about worlds where misalignment is hard (or perhaps “somewhat hard but not intractably hard”).