dr_s comments on Deep learning models might be secretly (almost) linear

dr_s 25 Apr 2023 9:59 UTC
1 point
−3
I wonder if perhaps a weaker and more defensible thesis was that deep learning models are mostly linear (and maybe the few non-linearities could be separated and identified? Has anyone tried applying ReLUs only to some outputs, leaving most of the rest untouched?). It would seem really weird to me if they really were linear. If that was the case it would mean that:
1. activation functions are essentially unnecessary
2. forget SGD, you can just do one shot linear regression to train them (well, ok, no, they’re still so big that you probably need gradient descent, but it’s a much more deterministic process if it’s a linear function that you’re fitting)
You wouldn’t even need multiple layers, just one big tensor. It feels weird that an entire field might have just overlooked such a trivial solution.