RSS

Ma­chine Unlearning

TagLast edit: Oct 23, 2023, 5:15 PM by NickyP

In Machine Unlearning, the aim is to reduce performance on some “unlearned” tasks, while keeping performance on some “retained” tasks. While traditionally used in the context of privacy preservation and GDPR, some of the research is relevant to the field of AI Interpretability. Here is some terminology often used in the machine unlearning literature. (note that there can be some minor differences in use):


For an overview, one can look at “A Survey of Machine Unlearning

Ma­chine Un­learn­ing Eval­u­a­tions as In­ter­pretabil­ity Benchmarks

Oct 23, 2023, 4:33 PM
33 points
2 comments11 min readLW link

Deep For­get­ting & Un­learn­ing for Safely-Scoped LLMs

scasperDec 5, 2023, 4:48 PM
126 points
30 comments13 min readLW link

Gra­di­ent Rout­ing: Mask­ing Gra­di­ents to Lo­cal­ize Com­pu­ta­tion in Neu­ral Networks

Dec 6, 2024, 10:19 PM
165 points
12 comments11 min readLW link
(arxiv.org)

The case for un­learn­ing that re­moves in­for­ma­tion from LLM weights

Fabien RogerOct 14, 2024, 2:08 PM
96 points
18 comments6 min readLW link

Break­ing Cir­cuit Breakers

Jul 14, 2024, 6:57 PM
53 points
13 comments1 min readLW link
(confirmlabs.org)

Un­learn­ing via RMU is mostly shallow

Jul 23, 2024, 4:07 PM
54 points
4 comments6 min readLW link

Ma­chine Un­learn­ing in Large Lan­guage Models: A Com­pre­hen­sive Sur­vey with Em­piri­cal In­sights from the Qwen 1.5 1.8B Model

Saketh BaddamFeb 1, 2025, 9:26 PM
9 points
2 comments11 min readLW link
No comments.