But—I still suspect, without being able to quantify, that alignment is worse than the other sciences in that the standards by-which-people-agree-what-good-work-is are just uncertain.
People in alignment sometimes say that alignment is pre-paradigmatic. I think that’s a good frame—I take it to mean that the standards of what qualifies as good work themselves not yet ascertained, among many other things. I think that if paradigmaticity is a line with math on the left and, like… pre-atomic chemistry all the way on the right, alignment is pretty far to the right. Modern RL is further to the left, and modern supervised learning with transformers much further to the left, followed up by things for which we actually have textbooks which don’t go out of date every 12 months.
I don’t think this would be disputed?
Noting that I don’t dispute this.
An important reason why this is true is because existential risk prevention can’t be an experimental field. Some existential risks—such as asteroid impacts—can be understood with strong theory (like, settled physics). AI risk isn’t one of those (and any path by which it could become one of those depends on an inferential leap which is itself uncertain, namely, extrapolating results from near-term AI experiments, to much more powerful AI systems).
Noting that I don’t dispute this.
An important reason why this is true is because existential risk prevention can’t be an experimental field. Some existential risks—such as asteroid impacts—can be understood with strong theory (like, settled physics). AI risk isn’t one of those (and any path by which it could become one of those depends on an inferential leap which is itself uncertain, namely, extrapolating results from near-term AI experiments, to much more powerful AI systems).