gwern comments on DL towards the unaligned Recursive Self-Optimization attractor

gwern 18 Dec 2021 20:29 UTC
8 points

In addition, there are two pretty obvious bugs in a reasonably popular optimization library (100+ github stars) that reduce performance and haven’t been fixed or noticed in “Issues” for a long time.

Karpathy’s law: “neural nets want to work”. This is another source of capabilities jumps: where the capability ‘existed’, but there was just a bug that crippled it (eg R2D2) with a small, often one-liner, fix.

The more you have a self-improving system that feeds back into itself hyperbolically, the more it functions end-to-end and removes the hardwired (human-engineered) parts that Amdahl’s-laws the total output, the more you may go from “pokes around doing nothing much, diverging half the time, beautiful idea, too bad it doesn’t work in the real world” to “FOOM”. (This is also the model of the economy that things like Solow growth models usually lead to: humanity or Europe pokes around doing nothing much discernible, nothing anyone like chimpanzees or the Aztec Empire should worry about, until...)
What links here?
- quanticle's comment on Massive Scaling Should be Frowned Upon by harsimony (19 Nov 2022 6:10 UTC; 12 points)
- jacob_cannell 18 Dec 2021 20:41 UTC
  4 points
  Parent
  
  Karpathy’s law: “neural nets want to work”. This is another source of capabilities jumps: where the capability ‘existed’, but there was just a bug that crippled it
  
  I’ve experienced this first hand, spending days trying to track down disappointing classification accuracy, assuming some bug in my model/math, only to find out later it was actually a bug in a newer custom matrix mult routine that my (insufficient) unit tests didn’t cover. It had just never occurred to me that GD could optimize around that.
  
  And on a related note, some big advances—arguably even transformers - are more a case of just getting out of SGD’s way to let it do its thing rather than some huge new insight.