Charlie Steiner comments on An ML paper on data stealing provides a construction for “gradient hacking”

Charlie Steiner 30 Jul 2024 22:15 UTC
7 points
1
Well, let’s just create a convergent sequence of people having read more of the paper :P I read the introduction and skimmed the rest, and the paper seems cool and nontrivial—the result is you can engineer a base model that remembers the first input sent to it in finetuning (and maybe also some more averaged thing, usable for classification, that I didn’t understand the stability of).
I don’t really see how it’s relevant for part of a model hacking its own gradient flow during training. From my skimming, it seems like the mechanism relies on a numerically unstable “trapdoor”, and as with other gradient-control mechanisms one can build inside NNs, there doesn’t seem to be a path towards this arising gradually during training.