scasper comments on Eight Strategies for Tackling the Hard Part of the Alignment Problem

scasper 21 Jul 2023 16:58 UTC
LW: 2 AF: 2
1
AF
Thanks—I agree that this seems like an approach worth doing. I think that at CHAI and/or Redwood there is a little bit of work at least related to this, but don’t quote me on that. In general, it seems like if you have a model and then a smaller distilled/otherwise-compressed version of it, there is a lot you can do with them from an alignment perspective. I am not sure how much work has been done in the anomaly detection literature that involves distillation/compression.