Doesn’t this push more of the capabilities work that continues on unheeded into the category of those researchers who do not expect issues to arise in regards to alignment, or even those who assume that capabilities is equivalent to alignment? (I.e., for researchers somewhere in-between, who might be on the fence about whether to continue or not, the ones that choose to slow down capabilities research are ones who are less likely to believe that capabilities = alignment.)
Thus, we more likely enter a world in which the fastest-developing AI systems are made by teams who prefer sharing, tend to believe that capabilities solves alignment automatically, and also (according to my own reasoning thus far) probably believe that the “universe abstracts well”, in the sense that interpretability should be fairly straightforward, and the most capable AI models will also be the most interpretable, and vice-versa.
This last thought might be kind of interesting; It might imply that the latter category of researchers will tend to develop their AI in such a fashion. Therefore, how correct their overall models of AI turn out to be might also be reflected in how actually successful their capabilities progress is demonstrated.
Doesn’t this push more of the capabilities work that continues on unheeded into the category of those researchers who do not expect issues to arise in regards to alignment, or even those who assume that capabilities is equivalent to alignment? (I.e., for researchers somewhere in-between, who might be on the fence about whether to continue or not, the ones that choose to slow down capabilities research are ones who are less likely to believe that capabilities = alignment.)
Thus, we more likely enter a world in which the fastest-developing AI systems are made by teams who prefer sharing, tend to believe that capabilities solves alignment automatically, and also (according to my own reasoning thus far) probably believe that the “universe abstracts well”, in the sense that interpretability should be fairly straightforward, and the most capable AI models will also be the most interpretable, and vice-versa.
This last thought might be kind of interesting; It might imply that the latter category of researchers will tend to develop their AI in such a fashion. Therefore, how correct their overall models of AI turn out to be might also be reflected in how actually successful their capabilities progress is demonstrated.