I agree with your point that blobs of bayes net nodes aren’t very legible, but I still think neural nets are relevantly a lot less interpretable than that! I think basically all structure that limits how your AI does its thinking is helpful for alignment, and that neural nets are pessimal on this axis.
In particular, an AI system based on a big bayes net can generate its outputs in a fairly constrained and structured way, using some sort of inference algorithm that tries to synthesize all the local constraints. A neural net lacks this structure, and is thereby basically unconstrained in the type of work it’s allowed to perform.
All else equal, more structure in your AI should mean less room for dangerous computations, and lower the surface area you need to inspect.
I’d crystallize the argument here as something like: suppose we’re analyzing a neural net doing inference, and we find that its internal computation is implementing <algorithm> for Bayesian inference on <big Bayes net>. That would be a huge amount of interpretability progress, even though the “big Bayes net” part is still pretty uninterpretable.
When we use Bayes nets directly, we get that kind of step for free.
… I think that’s decent argument, and I at least partially buy it.
A neural net lacks this structure, and is thereby basically unconstrained in the type of work it’s allowed to perform.
That said, if we compare a neural net directly to a Bayes net (as opposed to inference-on-a-Bayes-net), they have basically the same structure: both are circuits. Both constrain locality of computation.
I agree with your point that blobs of bayes net nodes aren’t very legible, but I still think neural nets are relevantly a lot less interpretable than that! I think basically all structure that limits how your AI does its thinking is helpful for alignment, and that neural nets are pessimal on this axis.
In particular, an AI system based on a big bayes net can generate its outputs in a fairly constrained and structured way, using some sort of inference algorithm that tries to synthesize all the local constraints. A neural net lacks this structure, and is thereby basically unconstrained in the type of work it’s allowed to perform.
All else equal, more structure in your AI should mean less room for dangerous computations, and lower the surface area you need to inspect.
I’d crystallize the argument here as something like: suppose we’re analyzing a neural net doing inference, and we find that its internal computation is implementing <algorithm> for Bayesian inference on <big Bayes net>. That would be a huge amount of interpretability progress, even though the “big Bayes net” part is still pretty uninterpretable.
When we use Bayes nets directly, we get that kind of step for free.
… I think that’s decent argument, and I at least partially buy it.
That said, if we compare a neural net directly to a Bayes net (as opposed to inference-on-a-Bayes-net), they have basically the same structure: both are circuits. Both constrain locality of computation.