I don’t think that’s it, because I don’t think the situation in AI Alignment is all that unusual. “Science progresses one funeral at a time” is a universally applied adage. There’s also an even more general “common wisdom” that people in a debate ~never succeed in changing each others’ minds, and the only point of having a debate is to sway the on-the-fence onlookers — that debates are a performance art.
There’s something fundamentally off in human psyche that invalidates Aumann’s for us.
And put like this, I think the capabilities researchers will forever disagree with the alignment researchers for this exact same reason.
It might be the case that it’s because of a more universal thing. Like sometimes time is just necessary for science to progress. And definitely the right view of debate is of changing the POV of onlookers, not the interlocutors.
But—I still suspect, without being able to quantify, that alignment is worse than the other sciences in that the standards by-which-people-agree-what-good-work-is are just uncertain.
People in alignment sometimes say that alignment is pre-paradigmatic. I think that’s a good frame—I take it to mean that the standards of what qualifies as good work themselves not yet ascertained, among many other things. I think that if paradigmaticity is a line with math on the left and, like… pre-atomic chemistry all the way on the right, alignment is pretty far to the right. Modern RL is further to the left, and modern supervised learning with transformers much further to the left, followed up by things for which we actually have textbooks which don’t go out of date every 12 months.
I don’t think this would be disputed? But this really means that it’s almost certain at some point that > 80% of alignment-related-intellectual-output will be tossed at some point in the future, because that’s what pre paradigmaticity means. (Like, 80% is arguably a best case scenario for preparadigmatic fields!) Which means in turn that engaging with it is really a deeply unattractive prospect.
I guess what I’m saying is that I agree that the situation for alignment is not at all bad for a pre-paradigmatic field, but if you call your field preparadigmatic that seems like a pretty bad place to be in, in term of what kind of credibility well-calibrated observers should accord you.
Edit: And like, to the degree that arguments that p(doom) is high are entirely separate from the field of alignment, this is actually a reason for ML engineers to care deeply about alignment, as a way of preventing doom, even if it is preparadigmatic! But I’m quite uncertain that this is true.
But—I still suspect, without being able to quantify, that alignment is worse than the other sciences in that the standards by-which-people-agree-what-good-work-is are just uncertain.
People in alignment sometimes say that alignment is pre-paradigmatic. I think that’s a good frame—I take it to mean that the standards of what qualifies as good work themselves not yet ascertained, among many other things. I think that if paradigmaticity is a line with math on the left and, like… pre-atomic chemistry all the way on the right, alignment is pretty far to the right. Modern RL is further to the left, and modern supervised learning with transformers much further to the left, followed up by things for which we actually have textbooks which don’t go out of date every 12 months.
I don’t think this would be disputed?
Noting that I don’t dispute this.
An important reason why this is true is because existential risk prevention can’t be an experimental field. Some existential risks—such as asteroid impacts—can be understood with strong theory (like, settled physics). AI risk isn’t one of those (and any path by which it could become one of those depends on an inferential leap which is itself uncertain, namely, extrapolating results from near-term AI experiments, to much more powerful AI systems).
I do want to argue against the theory that science progresses one funeral at a time:
“Science advances one funeral at a time” → this seems to be both generally not true as well as being a harmful meme (because it is a common argument used to argue against life extension research).
I don’t think that’s it, because I don’t think the situation in AI Alignment is all that unusual. “Science progresses one funeral at a time” is a universally applied adage. There’s also an even more general “common wisdom” that people in a debate ~never succeed in changing each others’ minds, and the only point of having a debate is to sway the on-the-fence onlookers — that debates are a performance art.
There’s something fundamentally off in human psyche that invalidates Aumann’s for us.
And put like this, I think the capabilities researchers will forever disagree with the alignment researchers for this exact same reason.
It might be the case that it’s because of a more universal thing. Like sometimes time is just necessary for science to progress. And definitely the right view of debate is of changing the POV of onlookers, not the interlocutors.
But—I still suspect, without being able to quantify, that alignment is worse than the other sciences in that the standards by-which-people-agree-what-good-work-is are just uncertain.
People in alignment sometimes say that alignment is pre-paradigmatic. I think that’s a good frame—I take it to mean that the standards of what qualifies as good work themselves not yet ascertained, among many other things. I think that if paradigmaticity is a line with math on the left and, like… pre-atomic chemistry all the way on the right, alignment is pretty far to the right. Modern RL is further to the left, and modern supervised learning with transformers much further to the left, followed up by things for which we actually have textbooks which don’t go out of date every 12 months.
I don’t think this would be disputed? But this really means that it’s almost certain at some point that > 80% of alignment-related-intellectual-output will be tossed at some point in the future, because that’s what pre paradigmaticity means. (Like, 80% is arguably a best case scenario for preparadigmatic fields!) Which means in turn that engaging with it is really a deeply unattractive prospect.
I guess what I’m saying is that I agree that the situation for alignment is not at all bad for a pre-paradigmatic field, but if you call your field preparadigmatic that seems like a pretty bad place to be in, in term of what kind of credibility well-calibrated observers should accord you.
Edit: And like, to the degree that arguments that p(doom) is high are entirely separate from the field of alignment, this is actually a reason for ML engineers to care deeply about alignment, as a way of preventing doom, even if it is preparadigmatic! But I’m quite uncertain that this is true.
Noting that I don’t dispute this.
An important reason why this is true is because existential risk prevention can’t be an experimental field. Some existential risks—such as asteroid impacts—can be understood with strong theory (like, settled physics). AI risk isn’t one of those (and any path by which it could become one of those depends on an inferential leap which is itself uncertain, namely, extrapolating results from near-term AI experiments, to much more powerful AI systems).
Yeah, I agree with that.
I do want to argue against the theory that science progresses one funeral at a time:
Fair enough, I suppose I don’t actually have a vast body of rigorous evidence in favour of that phrase.