An ML paper on data stealing provides a construction for “gradient hacking”

David Scott Krueger (formerly: capybaralet)30 Jul 2024 21:44 UTC

21 points

The paper “Privacy Backdoors: Stealing Data with Corrupted Pretrained Models” introduces “data traps” as a way of making a neutral network remember a chosen training example, even given further training. This involves storing the chosen example in the weights and then ensuring those weights are not updated.

I have not read the paper, but it seems it might be relevant for gradient hacking https://www.lesswrong.com/posts/uXH4r6MmKPedk8rMA/gradient-hacking

David Scott Krueger (formerly: capybaralet)30 Jul 2024 21:44 UTC

21 points

1 comment1 min readLW link