This sounds like the shortest, most succinct description of what I consider 1 central problem in the entire AI alignment community that makes a lot of their other thoughts on AI risk very distorted, so thank you for stating the issue so clearly.
I don’t think this is the only problem, and there is another problem I’ll mention in this comment, but as a description of why I tend to be more optimistic about data-based alignment strategies than many others, this statement is why I think this way:
It also part of the genre of arguments that argue that AI systems should act in particular ways regardless of their domain, training data, and training procedure—which I think by now we should have extremely strong priors against, given that for literally all AI systems—and I mean literally all, including MCTS-based self-play systems—the data from which the NN’s learn is enormously important for what those NNs learn.
My post about the issue of assuming instrumental convergence being far more unconstrained than happens in reality is also another central issue in the AI alignment community:
This sounds like the shortest, most succinct description of what I consider 1 central problem in the entire AI alignment community that makes a lot of their other thoughts on AI risk very distorted, so thank you for stating the issue so clearly.
I don’t think this is the only problem, and there is another problem I’ll mention in this comment, but as a description of why I tend to be more optimistic about data-based alignment strategies than many others, this statement is why I think this way:
My post about the issue of assuming instrumental convergence being far more unconstrained than happens in reality is also another central issue in the AI alignment community:
https://www.lesswrong.com/posts/HHkYEyFaigRpczhHy/ai-88-thanks-for-the-memos#EZLm32pKskw8F5xiF