That being said, I think that, most of the time, alignment work ending up in training data is good, since it can help our AI systems be differentially better at AI alignment research (e.g. relative to how good they are at AI capabilities research), which is something that I think is pretty important.
That consideration seems relevant only for language models that will be doing/​supporting alignment work.
That consideration seems relevant only for language models that will be doing/​supporting alignment work.