Problem: we want to make it hard for ML systems (trained via SGD) to perform naive gradient hacking. By naive gradient hacking, I mean “being able to keep some weights of the network constant for an arbitrary step of SGD”.
Solution: do “stochastic” regularisation, e.g. sample the amount of regularisation we perform randomly (could use quantum stuff if we want true randomness). This seems like it should make naive gradient hacking almost impossible—in order to keep some target weights unchanged, you’d have to match their +ve contribution to the loss to the degree of regularisation. If the degree of regularisation is stochastic, the loss contribution must also be stochastic, which is not possible with a determinstic NN!
The question of how useful this is from preventing any gradient hacking depends on how stable the loss landscape around some “deceptive / gradient hacking minimum is”. Seems possible the surrounding loss landscape could be pretty unstable to random pertubations?
Problem: we want to make it hard for ML systems (trained via SGD) to perform naive gradient hacking. By naive gradient hacking, I mean “being able to keep some weights of the network constant for an arbitrary step of SGD”.
Solution: do “stochastic” regularisation, e.g. sample the amount of regularisation we perform randomly (could use quantum stuff if we want true randomness). This seems like it should make naive gradient hacking almost impossible—in order to keep some target weights unchanged, you’d have to match their +ve contribution to the loss to the degree of regularisation. If the degree of regularisation is stochastic, the loss contribution must also be stochastic, which is not possible with a determinstic NN!
The question of how useful this is from preventing any gradient hacking depends on how stable the loss landscape around some “deceptive / gradient hacking minimum is”. Seems possible the surrounding loss landscape could be pretty unstable to random pertubations?