On catastrophic forgetting: our appendix includes a “control” version of ERA that doesn’t use gradient routing but is otherwise the same (appendix C, figure 12). This shows that the effect of retain-set fine-tuning is negligible in the absence of gradient routing.
On gradient ascent or similar methods: there are many unlearning methods that don’t target or achieve the kind of robust localization and removal that we care about, as mentioned in our discussion of related works, and, e.g., in this post. We included RMU as a stand-in for this class, and I personally don’t see much value in doing more extensive comparisons there.
On Corrective Unlearning: we weren’t aware of other unlearning approaches that consider imperfect labeling, so this is a very helpful reference—thanks! It would be interesting to compare ERA-type methods to these. My concern with fine-tuning methods is that they might not be suitable for robustly removing broader capabilities (like, “virology”) as opposed to correcting for small perturbations to the training data.
Thanks for the feedback and references!
On catastrophic forgetting: our appendix includes a “control” version of ERA that doesn’t use gradient routing but is otherwise the same (appendix C, figure 12). This shows that the effect of retain-set fine-tuning is negligible in the absence of gradient routing.
On gradient ascent or similar methods: there are many unlearning methods that don’t target or achieve the kind of robust localization and removal that we care about, as mentioned in our discussion of related works, and, e.g., in this post. We included RMU as a stand-in for this class, and I personally don’t see much value in doing more extensive comparisons there.
On Corrective Unlearning: we weren’t aware of other unlearning approaches that consider imperfect labeling, so this is a very helpful reference—thanks! It would be interesting to compare ERA-type methods to these. My concern with fine-tuning methods is that they might not be suitable for robustly removing broader capabilities (like, “virology”) as opposed to correcting for small perturbations to the training data.