Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc
There’s a common perception that various non-deep-learning ML paradigms—like logic, probability, causality, etc—are very interpretable, whereas neural nets aren’t. I claim this is wrong.
It’s easy to see where the idea comes from. Look at the sort of models in, say, Judea Pearl’s work. Like this:
It says that either the sprinkler or the rain could cause a wet sidewalk, season is upstream of both of those (e.g. more rain in spring, more sprinkler use in summer), and sidewalk slipperiness is caused by wetness. The Pearl-style framework lets us do all sorts of probabilistic and causal reasoning on this system, and it all lines up quite neatly with our intuitions. It looks very interpretable.
The problem, I claim, is that a whole bunch of work is being done by the labels. “Season”, “sprinkler”, “rain”, etc. The math does not depend on those labels at all. If we code an ML system to use this sort of model, its behavior will also not depend on the labels at all. They’re just suggestively-named LISP tokens. We could use the exact same math/code to model some entirely different system, like my sleep quality being caused by room temperature and exercise, with both of those downstream of season, and my productivity the next day downstream of sleep.
We could just replace all the labels with random strings, and the model would have the same content:
Now it looks a lot less interpretable.
Perhaps that seems like an unfair criticism? Like, the causal model is doing some nontrivial work, but connecting the labels to real-world objects just isn’t the problem it solves?
… I think that’s true, actually. But connecting the internal symbols/quantities/data structures of a model to external stuff is (I claim) exactly what interpretability is all about.
Think about interpretability for deep learning systems. A prototypical example for what successful interpretability might look like is e.g. we find a neuron which robustly lights up specifically in response to trees. It’s a tree-detector! That’s highly interpretable: we know what that neuron “means”, what it corresponds to in the world. (Of course in practice single neurons are probably not the thing to look at, and also the word “robustly” is doing a lot of subtle work, but those points are not really relevant to this post.)
The corresponding problem for a logic/probability/causality-based model would be: take a variable or node, and figure out what thing in the world it corresponds to, ignoring the not-actually-functionally-relevant label. Take the whole system, remove the labels, and try to rederive their meanings.
… which sounds basically-identical to the corresponding problem for deep learning systems.
We are no more able to solve that problem for logic/probability/causality systems than we are for deep learning systems. We can have a node in our model labeled “tree”, but we are no more (or less) able to check that it actually robustly represents trees than we are for a given neuron in a neural network. Similarly, if we find that it does represent trees and we want to understand how/why the tree-representation works, all those labels are a distraction.
One could argue that we’re lucky deep learning is winning the capabilities race. At least this way it’s obvious that our systems are uninterpretable, that we have no idea what’s going on inside the black box, rather than our brains seeing the decorative natural-language name “sprinkler” on a variable/node and then thinking that we know what the variable/node means. Instead, we just have unlabeled nodes—an accurate representation of our actual knowledge of the node’s “meaning”.
- Natural Abstractions: Key claims, Theorems, and Critiques by 16 Mar 2023 16:37 UTC; 228 points) (
- Voting Results for the 2022 Review by 2 Feb 2024 20:34 UTC; 57 points) (
- Scaffolded LLMs: Less Obvious Concerns by 16 Jun 2023 10:39 UTC; 32 points) (
- 31 Oct 2022 23:36 UTC; 20 points) 's comment on “Cars and Elephants”: a handwavy argument/analogy against mechanistic interpretability by (
- 2 Jan 2024 1:14 UTC; 5 points) 's comment on AI Alignment Metastrategy by (
- 7 Nov 2024 13:49 UTC; 4 points) 's comment on Goal: Understand Intelligence by (
- 9 Jul 2023 18:15 UTC; 3 points) 's comment on Internal independent review for language model agent alignment by (
- 8 Jul 2023 22:16 UTC; 2 points) 's comment on Internal independent review for language model agent alignment by (
I think this post makes a true and important point, a point that I also bring up from time to time.
I do have a complaint though: I think the title (“Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc”) is too strong. (This came up multiple times in the comments.)
In particular, suppose it takes N unlabeled parameters to solve a problem with deep learning, and it takes M unlabeled parameters to solve the same problem with probabilistic programming. And suppose that M<N, or even M<<N, which I think is generally plausible.
If Person X notices that M<<N, and then declares “deep learning is less interpretable than probabilistic programming”, well that’s not a crazy thing for them to say. And if M=5 and N=5000, then I think Person X is obviously correct, whereas the OP title is wrong. On the other hand, if M is a trillion and N is a quadrillion, then presumably the situation is that basically neither is interpretable, and maybe Person X’s statement “deep learning is less interpretable than probabilistic programming” is still maybe literally true on some level, but it kinda gives the wrong impression, and the OP title is perhaps more appropriate.
Anyway, I think a more defensible title would have been “Logic / Probability / Etc. Systems can be giant inscrutable messes too”, or something like that.
Better yet, the text could have explicitly drawn a distinction between what probabilistic programming systems typically look like today (i.e., a handful of human-interpretable parameters), and what they would look like if they were scaled to AGI (i.e. billions of unlabeled nodes and connections inferred from data, or so I would argue).