Oh this reminded me of the temperature component of SLT, which I believe modulates how sharply one should sample from the bayesian posterior, or perhaps how heavily to update on new evidence. I forget. In any case, it does this to try to capture the stochasticity component of SGD. Its still an open problem to show how successfully though, I believe.
Oh this reminded me of the temperature component of SLT, which I believe modulates how sharply one should sample from the bayesian posterior, or perhaps how heavily to update on new evidence. I forget. In any case, it does this to try to capture the stochasticity component of SGD. Its still an open problem to show how successfully though, I believe.