evhub comments on [Link] A minimal viable product for alignment

evhub 21 Apr 2022 14:22 UTC
LW: 15 AF: 9
AF
I think this concern is only relevant if your strategy is to do RL on human evaluations of alignment research. If instead you just imitate the distribution of current alignment research, I don’t think you get this problem, at least anymore than we have it now—and I think you can still substantially accelerate alignment research with just imitation. Of course, you still have inner alignment issues, but from an outer alignment perspective I think imitation of human alignment research is a pretty good thing to try.