RogerDearnaley comments on Dialogue introduction to Singular Learning Theory

RogerDearnaley Jul 11, 2024, 5:32 AM
3 points
1
The thing that excites me most about SLT is the extent to which it takes things that had previously been observed and had become useful rules of thumb/folk wisdom (e.g. SGD+momentum on neural nets doesn’t seem to overfit due to large parameter counts anything like as much as other smaller classes of machine learning models did), things that in many case people were previously rather puzzled by, and puts them on a solid theoretical foundation that can be explained compactly, and that also suggests where there are assumptions underlying this are that might fail under certain circumstances (e.g. if your SGD+momentum for some reason wasn’t well-approximating Bayesian inference).
We would really like our Alignment engineering to be as solid and trustworthy as possible — I’m not personally hopeful that we can get all the way to machine-verified mathematical proofs of model safety (lovely as that would be), but having mathematical understanding of some of the assumptions that we’re reasoning about model safety based on is a lot better then just having folk wisdom.