Adding some thoughts as someone who works on probabilistic programming, and has colleagues who work on neurosymbolic approaches to program synthesis:
I think a lot of Bayes net structure learning / program synthesis approaches (Bayesian or otherwise) have the issue of uninformative variable names, but I do think it’s possible to distinguish between structural interpretability and naming interpretability, as others have noted.
In practice, most neural or Bayesian program synthesis applications I’m aware of exhibit something like structural interpretability, because the hypothesis space they live in is designed by modelers to have human-interpretable semantic structure. Two good examples of this are the prior over programs that generate handwritten characters in Lake et al (2015), and the PCFG prior over Gaussian Process covariance kernels in Saad et al (2019). See e.g. Figure 6 on how you perform analysis on programs generated by this prior, to determine whether a particular timeseries is likely to be periodic, has a linear trend, has a changepoint, etc.
Regarding uninformative variable names, there’s ongoing work on using natural language to guide program synthesis, so as to come up with more language-like conceptual abstractions (e.g. Wong et al 2021). I wouldn’t be surprised if these approaches could also be extended to come up with informative variable and function names / comments. A related line of work is that people are starting to use LLMs to deobfuscate code (e.g. Lachaux et al 2021), and I expect the same techniques will work for synthesized code.
For these reasons, I’m more optimistic about the interpretability prospects of learning approaches that generate models or code that look like traditional symbolic programs, relative to end-to-end deep learning approaches. (Note that neural networks are also “symbolic programs”, just written with a more restricted set of [differentiable] primitives, and typically staying within a set of widely used program structures [i.e. neural architectures]).
The more difficult question IMO is whether this interpretability comes at the cost of capabilities. I think this is possibly true in some domains (e.g. learning low-level visual patterns and cues), but not others (e.g. learning the compositional structure of e.g. furniture-like objects).
In practice, most neural or Bayesian program synthesis applications I’m aware of exhibit something like structural interpretability, because the hypothesis space they live in is designed by modelers to have human-interpretable semantic structure. Two good examples of this…
When I squint out towards the horizon, I see future researchers trying to do a Bayesian program synthesis thing that builds a generative model of the whole world—everything from “tires are usually black”, to “it’s gauche to wear white after labor day”, to “in this type of math problem, maybe try applying the Cauchy–Schwarz inequality”, etc. etc. etc.
I’m perfectly happy to believe that Lake et al. can program-synthesis a little toy generative model of handwritten characters such that it has structural interpretability. But I’m concerned that we’ll work our way up to the thing in the previous paragraph, which might be a billion times more complicated, and it will no longer have structural interpretability.
(And likewise I’m concerned that solutions to “uninformative variable names” won’t scale—e.g., how are we going to automatically put English-language labels on the various intuitive models / heuristics that are involved when Ed Witten is thinking about math, or when MLK Jr is writing a speech?)
I’m more optimistic about the interpretability prospects of learning approaches that generate models or code that look like traditional symbolic programs, relative to end-to-end deep learning approaches [emphasis added]
Nominally, I agree with this. But “relative to” is key here.
Your takeaway seems to be “OK, great, let’s do probabilistic generative models, they’re better!”.
By contrast, my perspective is: “If we take the probabilistic generative model approach, we’re in huge trouble with respect to interpretability, oh man this is really really bad, we gotta work on this ASAP!!! (Oh and by the way if we take the deep net approach then it’s even worse.)”.
On the contrary, I think there exist large, complex, symbolic models of the world that are far more interpretable and useful than learned neural models, even if too complex for any single individual to understand, e.g.:
- The Unity game engine (a configurable model of the physical world) - Pixar’s RenderMan renderer (a model of optics and image formation) - The GLEAMviz epidemic simulator (a model of socio-biological disease spread at the civilizational scale)
Humans are capable of designing and building these models, and learning how to build/write them as they improve their understanding of the world. The difficult part is how we can recapitulate that ability—program synthesis is only in its infancy in it’s ability to do so, but IMO contemporary end-to-end deep learning methods seem unlikely to deliver here if want both interpretability and usefulness.
I agree that gwern’s proposal “Any model simple enough to be interpretable is too simple to be useful” is an exaggeration. Even the Lake et al. handwritten-character-recognizer is useful.
I would have instead said “Any model simple enough to be interpretable is too simple to be sufficient for AGI”.
I notice that you are again bringing the discussion back to a comparison between program synthesis world-models versus deep learning world-models, whereas I want to talk about the possibility that neither would be human-interpretable by the time we reach AGI level.
Adding some thoughts as someone who works on probabilistic programming, and has colleagues who work on neurosymbolic approaches to program synthesis:
I think a lot of Bayes net structure learning / program synthesis approaches (Bayesian or otherwise) have the issue of uninformative variable names, but I do think it’s possible to distinguish between structural interpretability and naming interpretability, as others have noted.
In practice, most neural or Bayesian program synthesis applications I’m aware of exhibit something like structural interpretability, because the hypothesis space they live in is designed by modelers to have human-interpretable semantic structure. Two good examples of this are the prior over programs that generate handwritten characters in Lake et al (2015), and the PCFG prior over Gaussian Process covariance kernels in Saad et al (2019). See e.g. Figure 6 on how you perform analysis on programs generated by this prior, to determine whether a particular timeseries is likely to be periodic, has a linear trend, has a changepoint, etc.
Regarding uninformative variable names, there’s ongoing work on using natural language to guide program synthesis, so as to come up with more language-like conceptual abstractions (e.g. Wong et al 2021). I wouldn’t be surprised if these approaches could also be extended to come up with informative variable and function names / comments. A related line of work is that people are starting to use LLMs to deobfuscate code (e.g. Lachaux et al 2021), and I expect the same techniques will work for synthesized code.
For these reasons, I’m more optimistic about the interpretability prospects of learning approaches that generate models or code that look like traditional symbolic programs, relative to end-to-end deep learning approaches. (Note that neural networks are also “symbolic programs”, just written with a more restricted set of [differentiable] primitives, and typically staying within a set of widely used program structures [i.e. neural architectures]).
The more difficult question IMO is whether this interpretability comes at the cost of capabilities. I think this is possibly true in some domains (e.g. learning low-level visual patterns and cues), but not others (e.g. learning the compositional structure of e.g. furniture-like objects).
Thanks for your reply!
When I squint out towards the horizon, I see future researchers trying to do a Bayesian program synthesis thing that builds a generative model of the whole world—everything from “tires are usually black”, to “it’s gauche to wear white after labor day”, to “in this type of math problem, maybe try applying the Cauchy–Schwarz inequality”, etc. etc. etc.
I’m perfectly happy to believe that Lake et al. can program-synthesis a little toy generative model of handwritten characters such that it has structural interpretability. But I’m concerned that we’ll work our way up to the thing in the previous paragraph, which might be a billion times more complicated, and it will no longer have structural interpretability.
(And likewise I’m concerned that solutions to “uninformative variable names” won’t scale—e.g., how are we going to automatically put English-language labels on the various intuitive models / heuristics that are involved when Ed Witten is thinking about math, or when MLK Jr is writing a speech?)
Nominally, I agree with this. But “relative to” is key here.
Your takeaway seems to be “OK, great, let’s do probabilistic generative models, they’re better!”.
By contrast, my perspective is: “If we take the probabilistic generative model approach, we’re in huge trouble with respect to interpretability, oh man this is really really bad, we gotta work on this ASAP!!! (Oh and by the way if we take the deep net approach then it’s even worse.)”.
We could probably use a term or a phrase for this concept since it keeps coming up and is a fundamental problem. How about:
Corollary:
On the contrary, I think there exist large, complex, symbolic models of the world that are far more interpretable and useful than learned neural models, even if too complex for any single individual to understand, e.g.:
- The Unity game engine (a configurable model of the physical world)
- Pixar’s RenderMan renderer (a model of optics and image formation)
- The GLEAMviz epidemic simulator (a model of socio-biological disease spread at the civilizational scale)
Humans are capable of designing and building these models, and learning how to build/write them as they improve their understanding of the world. The difficult part is how we can recapitulate that ability—program synthesis is only in its infancy in it’s ability to do so, but IMO contemporary end-to-end deep learning methods seem unlikely to deliver here if want both interpretability and usefulness.
I agree that gwern’s proposal “Any model simple enough to be interpretable is too simple to be useful” is an exaggeration. Even the Lake et al. handwritten-character-recognizer is useful.
I would have instead said “Any model simple enough to be interpretable is too simple to be sufficient for AGI”.
I notice that you are again bringing the discussion back to a comparison between program synthesis world-models versus deep learning world-models, whereas I want to talk about the possibility that neither would be human-interpretable by the time we reach AGI level.