I’m not sure exactly which clusters you’re referring to, but I’ll just assume that you’re pointing to something like “people who aren’t very into the sharp left turn and think that iterative, carefully bootstrapped alignment is a plausible strategy.” If this isn’t what you were trying to highlight, I apologize. The rest of this comment might not be very relevant in that case.
To me, the views you listed here feel like a straw man or weak man of this perspective.
Furthermore, I think the actual crux is more often “prior to having to align systems that are collectively much more powerful than humans, we’ll only have to align systems that are somewhat more powerful than humans.” This is essentially the crux you highlight in A Case for the Least Forgiving Take On Alignment. I believe disagreements about hands-on experience are quite downstream of this crux: I don’t think people with reasonable views (not weak men) believe that “without prior access to powerful AIs, humans will need to align AIs that are vastly, vastly superhuman, but this will be fine because these AIs will need lots of slow, hands-on experience in the world to do powerful stuff (like nanotech).”
So, discussing how well superintelligent AIs can operate from first principles seems mostly irrelevant to this discussion (if by superintelligent AI, you mean something much, much smarter than the human range).
I would be more sympathetic if you made a move like, “I’ll accept continuity through the human range of intelligence, and that we’ll only have to align systems as collectively powerful as humans, but I still think that hands-on experience is only...” In particular, I think there is a real disagreement about the relative value of experimenting on future dangerous systems instead of working on theory or trying to carefully construct analogous situations today by thinking in detail about alignment difficulties in the future.
I’m not sure exactly which clusters you’re referring to, but I’ll just assume that you’re pointing to something like “people who aren’t very into the sharp left turn and think that iterative, carefully bootstrapped alignment is a plausible strategy.” If this isn’t what you were trying to highlight, I apologize. The rest of this comment might not be very relevant in that case.
To me, the views you listed here feel like a straw man or weak man of this perspective.
Furthermore, I think the actual crux is more often “prior to having to align systems that are collectively much more powerful than humans, we’ll only have to align systems that are somewhat more powerful than humans.” This is essentially the crux you highlight in A Case for the Least Forgiving Take On Alignment. I believe disagreements about hands-on experience are quite downstream of this crux: I don’t think people with reasonable views (not weak men) believe that “without prior access to powerful AIs, humans will need to align AIs that are vastly, vastly superhuman, but this will be fine because these AIs will need lots of slow, hands-on experience in the world to do powerful stuff (like nanotech).”
So, discussing how well superintelligent AIs can operate from first principles seems mostly irrelevant to this discussion (if by superintelligent AI, you mean something much, much smarter than the human range).
I would be more sympathetic if you made a move like, “I’ll accept continuity through the human range of intelligence, and that we’ll only have to align systems as collectively powerful as humans, but I still think that hands-on experience is only...” In particular, I think there is a real disagreement about the relative value of experimenting on future dangerous systems instead of working on theory or trying to carefully construct analogous situations today by thinking in detail about alignment difficulties in the future.