If we aren’t good at assessing alignment research, there’s the risk that people substitute the goal of “doing good alignment research” with “doing research that’s recognized as good alignment research”. This could lead to a feedback loop where a particular notion of “good research” gets entrenched: Research is considered good if high status researchers think it’s good; the way to become a high status researcher is to do research which is considered good by the current definition, and have beliefs that conform with those of high status researchers.
A number of TurnTrout’s points were related to this (emphasis mine):
I think we’ve kinda patted ourselves on the back for being awesome and ahead of the curve, even though, in terms of alignment, I think we really didn’t get anything done until 2022 or so, and a lot of the meaningful progress happened elsewhere. [MY NOTE: I suspect more could have been done prior to 2022 if our notion of “good research” had been better calibrated, or even just broader]
(Medium confidence) It seems possible to me that “taking ideas seriously” has generally meant something like “being willing to change your life to further the goals and vision of powerful people in the community, or to better accord with socially popular trends”, and less “taking unconventional but meaningful bets on your idiosyncratic beliefs.”
Somewhat relatedly, there have been a good number of times where it seems like I’ve persuaded someone of A and of A⟹B, and they still don’t believe B, and coincidentally B is unpopular.
...
(Medium-high confidence) I think that alignment “theorizing” is often a bunch of philosophizing and vibing in a way that protects itself from falsification (or even proof-of-work) via words like “pre-paradigmatic” and “deconfusion.” I think it’s not a coincidence that many of the “canonical alignment ideas” somehow don’t make any testable predictions until AI takeoff has begun. 🤔
I’d like to see more competitions related to alignment research. I think it would help keep assessors honest if they were e.g. looking at 2 anonymized alignment proposals, trying to compare them on a point-by-point basis, figuring out which proposal has a better story for each possible safety problem. If competition winners subsequently become high status, that could bring more honesty to the entire ecosystem. Teach people to focus on merit rather than politics.
If we aren’t good at assessing alignment research, there’s the risk that people substitute the goal of “doing good alignment research” with “doing research that’s recognized as good alignment research”. This could lead to a feedback loop where a particular notion of “good research” gets entrenched: Research is considered good if high status researchers think it’s good; the way to become a high status researcher is to do research which is considered good by the current definition, and have beliefs that conform with those of high status researchers.
A number of TurnTrout’s points were related to this (emphasis mine):
I’d like to see more competitions related to alignment research. I think it would help keep assessors honest if they were e.g. looking at 2 anonymized alignment proposals, trying to compare them on a point-by-point basis, figuring out which proposal has a better story for each possible safety problem. If competition winners subsequently become high status, that could bring more honesty to the entire ecosystem. Teach people to focus on merit rather than politics.