A comment I made while explaining why I was excited about the above picture to someone:
The timing of the plateau seems very accurate, the RLCT seems smooth & consistent, The test loss goes up as the RLCT increases faster than the train decreases (as would be predicted byBg=L+λ/n), (I did the math here wrong, 1. the estimate doesn’t make sense between phases, and 2. If you go through the math, Bg’ is very much not sensitive to small increases or decreases in lambda), this was done on a clearly not made for this particular situation model, rather than a bespoke model, we see that once the RLCT starts to decrease from its peak, the train loss doesn’t move very much.
As far as I can tell, all lots of qualitative predictions of SLT’s relating to how lambda will interact with generalization and test losses are shown in this picture, and the model is a model someone thought of to do something completely different.
A comment I made while explaining why I was excited about the above picture to someone: