tailcalled comments on SGD’s Bias

tailcalled 19 May 2021 19:14 UTC
6 points
This depends on whether it can achieve perfect predictive power or not, no? What I had in mind was something like autoregressive text prediction, where there will always be some prediction errors. I would’ve assumed those prediction errors constantly introduce some noise into the gradients?
- johnswentworth 19 May 2021 19:32 UTC
  5 points
  Parent
  Ah, yeah, you’re right. Thanks, I was understanding the reason for convergence of SGD to a local minimum incorrectly. (Convergence depends on steadily decreasing $η$ ; that decrease is doing more work than I realized.)