In a more complex ad-hoc approach, we could instead design a way to extract a theory simulatable algorithm that our model is implementing. In other words, given a neural network, we run some type of meta-algorithm that analyzes the neural network and spits out psuedocode which describes what the neural network uses to make decisions. As I understand, this is roughly what Daniel Filan writes about in Mechanistic Transparency for Machine Learning.
I endorse this as a description of how I currently think about mechanistic transparency, although I haven’t reread the post (and imagine finding it somewhat painful to), so can’t fully endorse your claim.
Update: I reread the post (between commenting that and now, as prep for another post currently in draft form). It is better than I remember, and I’m pretty proud of it.
I endorse this as a description of how I currently think about mechanistic transparency, although I haven’t reread the post (and imagine finding it somewhat painful to), so can’t fully endorse your claim.
That seems like a important feeling to pay attention to, as a signal that you should update the post with new insights. :)
Update: I reread the post (between commenting that and now, as prep for another post currently in draft form). It is better than I remember, and I’m pretty proud of it.