It might be the case that it’s because of a more universal thing. Like sometimes time is just necessary for science to progress. And definitely the right view of debate is of changing the POV of onlookers, not the interlocutors.
But—I still suspect, without being able to quantify, that alignment is worse than the other sciences in that the standards by-which-people-agree-what-good-work-is are just uncertain.
People in alignment sometimes say that alignment is pre-paradigmatic. I think that’s a good frame—I take it to mean that the standards of what qualifies as good work themselves not yet ascertained, among many other things. I think that if paradigmaticity is a line with math on the left and, like… pre-atomic chemistry all the way on the right, alignment is pretty far to the right. Modern RL is further to the left, and modern supervised learning with transformers much further to the left, followed up by things for which we actually have textbooks which don’t go out of date every 12 months.
I don’t think this would be disputed? But this really means that it’s almost certain at some point that > 80% of alignment-related-intellectual-output will be tossed at some point in the future, because that’s what pre paradigmaticity means. (Like, 80% is arguably a best case scenario for preparadigmatic fields!) Which means in turn that engaging with it is really a deeply unattractive prospect.
I guess what I’m saying is that I agree that the situation for alignment is not at all bad for a pre-paradigmatic field, but if you call your field preparadigmatic that seems like a pretty bad place to be in, in term of what kind of credibility well-calibrated observers should accord you.
Edit: And like, to the degree that arguments that p(doom) is high are entirely separate from the field of alignment, this is actually a reason for ML engineers to care deeply about alignment, as a way of preventing doom, even if it is preparadigmatic! But I’m quite uncertain that this is true.
But—I still suspect, without being able to quantify, that alignment is worse than the other sciences in that the standards by-which-people-agree-what-good-work-is are just uncertain.
People in alignment sometimes say that alignment is pre-paradigmatic. I think that’s a good frame—I take it to mean that the standards of what qualifies as good work themselves not yet ascertained, among many other things. I think that if paradigmaticity is a line with math on the left and, like… pre-atomic chemistry all the way on the right, alignment is pretty far to the right. Modern RL is further to the left, and modern supervised learning with transformers much further to the left, followed up by things for which we actually have textbooks which don’t go out of date every 12 months.
I don’t think this would be disputed?
Noting that I don’t dispute this.
An important reason why this is true is because existential risk prevention can’t be an experimental field. Some existential risks—such as asteroid impacts—can be understood with strong theory (like, settled physics). AI risk isn’t one of those (and any path by which it could become one of those depends on an inferential leap which is itself uncertain, namely, extrapolating results from near-term AI experiments, to much more powerful AI systems).
It might be the case that it’s because of a more universal thing. Like sometimes time is just necessary for science to progress. And definitely the right view of debate is of changing the POV of onlookers, not the interlocutors.
But—I still suspect, without being able to quantify, that alignment is worse than the other sciences in that the standards by-which-people-agree-what-good-work-is are just uncertain.
People in alignment sometimes say that alignment is pre-paradigmatic. I think that’s a good frame—I take it to mean that the standards of what qualifies as good work themselves not yet ascertained, among many other things. I think that if paradigmaticity is a line with math on the left and, like… pre-atomic chemistry all the way on the right, alignment is pretty far to the right. Modern RL is further to the left, and modern supervised learning with transformers much further to the left, followed up by things for which we actually have textbooks which don’t go out of date every 12 months.
I don’t think this would be disputed? But this really means that it’s almost certain at some point that > 80% of alignment-related-intellectual-output will be tossed at some point in the future, because that’s what pre paradigmaticity means. (Like, 80% is arguably a best case scenario for preparadigmatic fields!) Which means in turn that engaging with it is really a deeply unattractive prospect.
I guess what I’m saying is that I agree that the situation for alignment is not at all bad for a pre-paradigmatic field, but if you call your field preparadigmatic that seems like a pretty bad place to be in, in term of what kind of credibility well-calibrated observers should accord you.
Edit: And like, to the degree that arguments that p(doom) is high are entirely separate from the field of alignment, this is actually a reason for ML engineers to care deeply about alignment, as a way of preventing doom, even if it is preparadigmatic! But I’m quite uncertain that this is true.
Noting that I don’t dispute this.
An important reason why this is true is because existential risk prevention can’t be an experimental field. Some existential risks—such as asteroid impacts—can be understood with strong theory (like, settled physics). AI risk isn’t one of those (and any path by which it could become one of those depends on an inferential leap which is itself uncertain, namely, extrapolating results from near-term AI experiments, to much more powerful AI systems).
Yeah, I agree with that.