I suspect telling selective lies that can later be found to be wrong is not the right analogy for adding noise, because this seems to me to be adding noise to the labels rather than to the gradients. I’d assume if you add noise to the labels, that could be counterproductive because the resulting gradients would be propagated disproportionately to all of the most functional and predictive parts of the network, destroying them in favor of flexibility.
I think equation 3 in the paper maybe suggests one way to view understand it: When practicing, rather than going with the way you have learned to do things, you should make random temporary changes to the process in which you do them.
When practicing, rather than going with the way you have learned to do things, you should make random temporary changes to the process in which you do them.
That sounds a lot like Deliberate Play (or Deliberate Practice?) and kids do it a lot (both).
Good question, I don’t know the answer.
I suspect telling selective lies that can later be found to be wrong is not the right analogy for adding noise, because this seems to me to be adding noise to the labels rather than to the gradients. I’d assume if you add noise to the labels, that could be counterproductive because the resulting gradients would be propagated disproportionately to all of the most functional and predictive parts of the network, destroying them in favor of flexibility.
I think equation 3 in the paper maybe suggests one way to view understand it: When practicing, rather than going with the way you have learned to do things, you should make random temporary changes to the process in which you do them.
That sounds a lot like Deliberate Play (or Deliberate Practice?) and kids do it a lot (both).