Thanks for the kind words, Adam! I’ll follow up over DM about early drafts — I’m interested in getting feedback that’s as broad as possible and really appreciate the kind offer here.
Typo is fixed — thanks for pointing it out!
At first I wondered why you were taking the sum instead of just C(L)=limT→∞L(T)−L(0)T, but after thinking about it, the latter would probably converge to 0 almost all the time, because even with amazing optimization, the loss will stop being improved by a factor linear in T at some point. That might be interesting to put in the post itself.
Yes, the problem with that definition would indeed be that if your optimizer converges to some limiting loss function value like limT→∞L(T)=L∞, then you’d get limT→∞L(T)−L(0)T=limT→∞L∞−L(0)T=0 for any L∞.
Thanks for the kind words, Adam! I’ll follow up over DM about early drafts — I’m interested in getting feedback that’s as broad as possible and really appreciate the kind offer here.
Typo is fixed — thanks for pointing it out!
Yes, the problem with that definition would indeed be that if your optimizer converges to some limiting loss function value like limT→∞L(T)=L∞, then you’d get limT→∞L(T)−L(0)T=limT→∞L∞−L(0)T=0 for any L∞.
Thanks again!