Ah, yeah, you’re right. Thanks, I was understanding the reason for convergence of SGD to a local minimum incorrectly. (Convergence depends on steadily decreasing η; that decrease is doing more work than I realized.)
Ah, yeah, you’re right. Thanks, I was understanding the reason for convergence of SGD to a local minimum incorrectly. (Convergence depends on steadily decreasing η; that decrease is doing more work than I realized.)