I’d crystallize the argument here as something like: suppose we’re analyzing a neural net doing inference, and we find that its internal computation is implementing <algorithm> for Bayesian inference on <big Bayes net>. That would be a huge amount of interpretability progress, even though the “big Bayes net” part is still pretty uninterpretable.
When we use Bayes nets directly, we get that kind of step for free.
… I think that’s decent argument, and I at least partially buy it.
A neural net lacks this structure, and is thereby basically unconstrained in the type of work it’s allowed to perform.
That said, if we compare a neural net directly to a Bayes net (as opposed to inference-on-a-Bayes-net), they have basically the same structure: both are circuits. Both constrain locality of computation.
I’d crystallize the argument here as something like: suppose we’re analyzing a neural net doing inference, and we find that its internal computation is implementing <algorithm> for Bayesian inference on <big Bayes net>. That would be a huge amount of interpretability progress, even though the “big Bayes net” part is still pretty uninterpretable.
When we use Bayes nets directly, we get that kind of step for free.
… I think that’s decent argument, and I at least partially buy it.
That said, if we compare a neural net directly to a Bayes net (as opposed to inference-on-a-Bayes-net), they have basically the same structure: both are circuits. Both constrain locality of computation.