Charlie Steiner comments on Race Along Rashomon Ridge

Charlie Steiner 7 Jul 2022 6:57 UTC
7 points
4
Interesting stuff. Prunability might not be great for interpretability—you might want something more like sparsity, or alignment of the neuron basis with specified human-interpretable classifications of the data.
- Peter S. Park 8 Jul 2022 2:03 UTC
  2 points
  0
  Parent
  Thanks so much, Charlie, for reading the post and for your comment! I really appreciate it.
  
  I think both ways to prune neurons and ways to make the neural net more sparse are very promising steps towards constructing a simultaneously optimal and interpretable model.
  
  I completely agree that alignment of the neuron basis with human-interpretable classifications of the data would really help interpretability. But if only a subset of the neuron basis are aligned with human-interpretability, and the complement comprises a very large subset of abstractions (which, necessarily, people would not be able to learn to interpret), then we haven’t made the model interpretable.
  Suppose 100% is the level of interpretability we need for guaranteed alignment (which I am convinced of, because even 1% uninterpretability can screw you over). Then low-dimensionality seems like a necessary, but not sufficient condition for intepretability. It is possible, but not always true, that each of a small number of abstractions will either already familiar to people or can be learned by people in a reasonable amount of time.