Seth Herd comments on Thoughts on “AI is easy to control” by Pope & Belrose

Seth Herd 6 Dec 2024 20:09 UTC
6 points
2
This post engages substantively and clearly with IMO the first or second most important thing we could be accomplishing on LW: making better estimates of how difficult alignment will be.

It analyzes how people who know a good deal about alignment theory could say something like “AI is easy to control” in good faith—and why that’s wrong, in both senses.

What Belrose and Pope were mostly saying, without being explicit about it, is that current AI is easy to control, then extrapolating from there by basically assuming we won’t make any changes to AI in the future that might dramatically change this situation.

This post addresses this and more subtle points, clarifying the discussion.