Bogdan Ionut Cirstea comments on The case for unlearning that removes information from LLM weights

Bogdan Ionut Cirstea 20 Oct 2024 21:04 UTC
2 points
0
Here’s a recent paper which might provide [inspiration for] another approach: Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned Concepts (though it seems at least somewhat related to the tamper-resistant paper mentioned in another comment).
Edit: I’d also be curious to see if editing-based methods, potentially combined with interp techniques (e.g. those in Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization and in Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces), might fare better, and there might also be room for cross-polination of methodologies.
- Fabien Roger 21 Oct 2024 12:59 UTC
  2 points
  0
  Parent
  These are mostly unlearning techniques, right? I am not familiar enough with diffusion models to make an informed guess about how good the unlearning of the meta-learning paper is (but in general, I am skeptical, in LLMs all attempts at meta-unlearning have failed, providing some robustness only against specific parameterizations, so absent evaluations of reparametrizations, I wouldn’t put much weights on the results).
  I think the inter-based unlearning techniques look cool! I would be cautious about using them for evaluation though, especially when they are being optimized against like in the ConceptVector paper. I put more faith in SGD making loss go down when the knowledge is present.