I am a big fan of reporting unlearning results across identified forget set fractions! That said, I think the unlearning results lack comparisons to important ablations/baselines which would really test if gradient routing is adding value. For eg: 1. CF (catastrophic forgetting) - This would involve removing most components of ERA, only keeping the finetuning on the retain set.
2. Ascent + CF—This would involve a light touch of gradient ascent (maximizing the loss) on the forget set, with simultaneous finetuning on the retain set. See [1] or AC↯DC in [2] for good implementations.
3. Methods that combine these concepts specifically for LLMs, like LLMU [3]
Without these, it is difficult to know if gradient routing is actually adding any value on top of what can be achieved with traditional finetuning.
Also, the SSD method has been shown to perform well on the setup of partial deletion sets [4], so another thing to check would be comparing Potion (a followup to SSD) [5] + finetuning on the retain set, which would stress-test the hypothesis of “we need gradient routing through a new subnetwork instead of just finding the relevant parts of the existing network”.
[1] Trippa, Daniel, et al. “$\nabla\tau $: Gradient-based and Task-Agnostic machine Unlearning.” CoRR 2024
[2] Kolipaka, Varshita, et al. “A Cognac shot to forget bad memories: Corrective Unlearning in GNNs.” arXiv preprint arXiv:2412.00789 (2024).
[3] Yao, Yuanshun, Xiaojun Xu, and Yang Liu. “Large language model unlearning.” arXiv preprint arXiv:2310.10683 (2023).
[4] Goel, Shashwat, et al. “Corrective machine unlearning.” TMLR 2024
[5] Schoepf, Stefan, Jack Foster, and Alexandra Brintrup. “Potion: Towards Poison Unlearning.” DMLR Journal 2024
On catastrophic forgetting: our appendix includes a “control” version of ERA that doesn’t use gradient routing but is otherwise the same (appendix C, figure 12). This shows that the effect of retain-set fine-tuning is negligible in the absence of gradient routing.
On gradient ascent or similar methods: there are many unlearning methods that don’t target or achieve the kind of robust localization and removal that we care about, as mentioned in our discussion of related works, and, e.g., in this post. We included RMU as a stand-in for this class, and I personally don’t see much value in doing more extensive comparisons there.
On Corrective Unlearning: we weren’t aware of other unlearning approaches that consider imperfect labeling, so this is a very helpful reference—thanks! It would be interesting to compare ERA-type methods to these. My concern with fine-tuning methods is that they might not be suitable for robustly removing broader capabilities (like, “virology”) as opposed to correcting for small perturbations to the training data.
Thanks for sharing these interesting results!
I am a big fan of reporting unlearning results across identified forget set fractions! That said, I think the unlearning results lack comparisons to important ablations/baselines which would really test if gradient routing is adding value. For eg:
1. CF (catastrophic forgetting) - This would involve removing most components of ERA, only keeping the finetuning on the retain set.
2. Ascent + CF—This would involve a light touch of gradient ascent (maximizing the loss) on the forget set, with simultaneous finetuning on the retain set. See [1] or AC↯DC in [2] for good implementations.
3. Methods that combine these concepts specifically for LLMs, like LLMU [3]
Without these, it is difficult to know if gradient routing is actually adding any value on top of what can be achieved with traditional finetuning.
Also, the SSD method has been shown to perform well on the setup of partial deletion sets [4], so another thing to check would be comparing Potion (a followup to SSD) [5] + finetuning on the retain set, which would stress-test the hypothesis of “we need gradient routing through a new subnetwork instead of just finding the relevant parts of the existing network”.
[1] Trippa, Daniel, et al. “$\nabla\tau $: Gradient-based and Task-Agnostic machine Unlearning.” CoRR 2024
[2] Kolipaka, Varshita, et al. “A Cognac shot to forget bad memories: Corrective Unlearning in GNNs.” arXiv preprint arXiv:2412.00789 (2024).
[3] Yao, Yuanshun, Xiaojun Xu, and Yang Liu. “Large language model unlearning.” arXiv preprint arXiv:2310.10683 (2023).
[4] Goel, Shashwat, et al. “Corrective machine unlearning.” TMLR 2024
[5] Schoepf, Stefan, Jack Foster, and Alexandra Brintrup. “Potion: Towards Poison Unlearning.” DMLR Journal 2024
Thanks for the feedback and references!
On catastrophic forgetting: our appendix includes a “control” version of ERA that doesn’t use gradient routing but is otherwise the same (appendix C, figure 12). This shows that the effect of retain-set fine-tuning is negligible in the absence of gradient routing.
On gradient ascent or similar methods: there are many unlearning methods that don’t target or achieve the kind of robust localization and removal that we care about, as mentioned in our discussion of related works, and, e.g., in this post. We included RMU as a stand-in for this class, and I personally don’t see much value in doing more extensive comparisons there.
On Corrective Unlearning: we weren’t aware of other unlearning approaches that consider imperfect labeling, so this is a very helpful reference—thanks! It would be interesting to compare ERA-type methods to these. My concern with fine-tuning methods is that they might not be suitable for robustly removing broader capabilities (like, “virology”) as opposed to correcting for small perturbations to the training data.