Charlie Steiner comments on Exciting New Interpretability Paper!

Charlie Steiner 17 May 2023 11:26 UTC
2 points
0
Huh, interesting. I skimmed the paper and I’m not convinced this specific architecture is promising for tasks that move a lot of information or have hierarchical structure—the lack of a value (only keys and queries) seems like a big downgrade. The graph classification results are pretty good though, and I’d agree with the authors that it’s probably because they’ve improved information routing without having much worse inductive biases than GCNNs. Does this match your impression?
I’m also kind of a downer about interpretability. There’s different kinds of it. Each neuron having an input-space interpretation that humans can mostly figure out by eyeballing it doesn’t help you much when you have ten billion neurons. The more powerful kinds of interpretability (which it would be exciting to get for free) have more to do with compression and search—they let you form simplified abstract models of an AI’s reasoning and tell you about the domains of validity of those models.