I really cannot say what that means, but I am also told “learning rate” is itself something of a misnomer and involves as much forgetting as learning.
Maybe temperature is a better word.
I’m curious about the retraction. Is it because of the later comments in the story, about how people change afterwards?
No, I just thought about it some more, and I realized that increasing the learning rate of a model (assuming the optimizer is something like SGD) would inject more randomness, just like increasing the temperature of simulated annealing would.
Maybe temperature is a better word.
I’m curious about the retraction. Is it because of the later comments in the story, about how people change afterwards?
No, I just thought about it some more, and I realized that increasing the learning rate of a model (assuming the optimizer is something like SGD) would inject more randomness, just like increasing the temperature of simulated annealing would.