Algon comments on Programmatic backdoors: DNNs can use SGD to run arbitrary stateful computation

Algon 23 Oct 2023 17:33 UTC
8 points
0
Somewhat related:

Dropout layers in a Transformer leak the phase bit (train/eval) - small example. So an LLM may be able to determine if it is being trained and if backward pass follows. Clear intuitively but good to see, and interesting to think through repercussions of.
-Andrej Karpathy. He’s got a Colab notebook demonstrating this.

Also, how complex is this model? Like, a couple of hundred bits? And how would we know if any model had implemented this?