jacob_cannell comments on DL towards the unaligned Recursive Self-Optimization attractor

jacob_cannell 18 Dec 2021 20:41 UTC
4 points

Karpathy’s law: “neural nets want to work”. This is another source of capabilities jumps: where the capability ‘existed’, but there was just a bug that crippled it

I’ve experienced this first hand, spending days trying to track down disappointing classification accuracy, assuming some bug in my model/math, only to find out later it was actually a bug in a newer custom matrix mult routine that my (insufficient) unit tests didn’t cover. It had just never occurred to me that GD could optimize around that.

And on a related note, some big advances—arguably even transformers - are more a case of just getting out of SGD’s way to let it do its thing rather than some huge new insight.