No shortcuts to knowledge: Why AI needs to ease up on scaling and learn how to code

Link post

Will scaling deep learning produce human-level generality, or do we need a new approach? You may have read the exchange between Scott Alexander and Gary Marcus, and felt that there are some good arguments on both sides, some bad ones, but few arguments that go beyond analogy and handwaving—arguments that would take what we know about deep learning and intelligence, and look at what that knowledge implies. If you haven’t read the exchange, here it is: SA, GM, SA, GM.

I will argue for Marcus’ position, but dive a little deeper than he does. I believe that symbolic representations, specifically programs, and learning as program synthesis, can provide data efficient and flexible generalization, in a way that deep learning can’t, no matter how much we scale it. I’ll show how probabilistic programs can represent causal models of the world, which deep learning can’t do, and why causal models are essential to intelligence. But I’ll start by examining the opposing view, that scaling deep learning is sufficient for general intelligence. To that end, I’ll quote from Gwern’s thorough essay on the scaling hypothesis.

Table of Contents:

The scaling hypothesis and the laziness of deep learning
What do neural networks learn?
NNs learn shortcuts because they are “easy to vary”
Representing knowledge as probabilistic programs
Learning as probabilistic program synthesis
Learning libraries of concepts
Probabilistic synthesis and causality as program editing
Implications for AI alignment

(Latex and interactive plots don’t display properly, please see link for full post)