David Scott Krueger (formerly: capybaralet) comments on Fun with +12 OOMs of Compute

David Scott Krueger (formerly: capybaralet) 5 Mar 2021 13:50 UTC
LW: 4 AF: 3
AF
There’s a ton of work in meta-learning, including Neural Architecture Search (NAS). AIGA’s (Clune) is a paper that argues a similar POV to what I would describe here, so I’d check that out.

I’ll just say “why it would be powerful”: the promise of meta-learning is that—just like learned features outperform engineered features—learned learning algorithms will eventually outperform engineered learning algorithms. Taking the analogy seriously would suggest that the performance gap will be large—a quantitative step-change.

The upper limit we should anchor on is fully automated research. This helps illustrate how powerful this could be, since automating research could easily give many orders of magnitude speed up (e.g. just consider the physical limitation of humans manually inputting information about what experiment to run).

An important underlying question is how much room there is for improvement over current techniques. The idea that current DL techniques are pretty close to perfect (i.e. we’ve uncovered the fundamental principles of efficient learning (associated view: …and maybe DNNs are a good model of the brain)) seems too often implicit in some of the discussions around forecasting and scaling. I think it’s a real possibility, but I think it’s fairly unlikely (~15%, OTTMH). The main evidence for it is that 99% of published improvements don’t seem to make much difference in practice/at-scale.

Assuming that current methods are roughly optimal has two important implications:
- no new fundamental breakthroughs needed for AGI (faster timelines)
- no possible acceleration from fundamental algorithmic breakthroughs (slower timelines)