Mm, I think there’s some disconnect in what we mean by an “interpretation” of a ML model. The “interpretation” of a neural network is not just some computational graph that’s behaviorally equivalent to the neural network. It’s the actual algorithm found by the SGD and implemented on the weights-and-biases of the neural network. Again, see Neel Nanda’s work here. The “interpretation” recovers the actual computations the neural network’s forward pass is doing.
You seem to say that there’s some special class of “connectionist” algorithms that are qualitatively and mechanically different from higher-level algorithms. Interpretability is more or less premised on the idea that it is not so; that artificial neurons are just the computational substrate on which the SGD is invited to write programs. And interpretability is hard because we, essentially, have to recover the high-level structure of SGD-written programs given just (the equivalent of) their machine code. Not because we’re trying to find a merely-equivalent algorithm.
I think this also addresses your concern that higher-level design is not possible to find in a timely manner. SGD manages it, so the amount of computation needed is upper-bounded by whatever goes into a given training run. And the SGD is blind, so yes, I imagine deliberative design — given theoretical understanding of the domain — would be much faster than whatever the SGD is doing. (Well, maybe not faster in real-time, given that human brains work slower than modern processors. But in a shorter number of computation-steps.)
You say it’s coherent to wish for this tech tree, and it is coherent to so wish—but do you think this is a promising research program?
Mm, I think there’s some disconnect in what we mean by an “interpretation” of a ML model. The “interpretation” of a neural network is not just some computational graph that’s behaviorally equivalent to the neural network. It’s the actual algorithm found by the SGD and implemented on the weights-and-biases of the neural network. Again, see Neel Nanda’s work here. The “interpretation” recovers the actual computations the neural network’s forward pass is doing.
You seem to say that there’s some special class of “connectionist” algorithms that are qualitatively and mechanically different from higher-level algorithms. Interpretability is more or less premised on the idea that it is not so; that artificial neurons are just the computational substrate on which the SGD is invited to write programs. And interpretability is hard because we, essentially, have to recover the high-level structure of SGD-written programs given just (the equivalent of) their machine code. Not because we’re trying to find a merely-equivalent algorithm.
I think this also addresses your concern that higher-level design is not possible to find in a timely manner. SGD manages it, so the amount of computation needed is upper-bounded by whatever goes into a given training run. And the SGD is blind, so yes, I imagine deliberative design — given theoretical understanding of the domain — would be much faster than whatever the SGD is doing. (Well, maybe not faster in real-time, given that human brains work slower than modern processors. But in a shorter number of computation-steps.)
Basically, yes.