I’m understanding that by “interpretability” you mean “we can attach values meaningful to people to internal nodes of the model”[1].
My guess is that logical/probabilistic models are regarded as more interpretable than DNNs mostly for two reasons:
They tend to have small number of inputs and the inputs are heavily engineered features (so the inputs themselves are already meaningful to people).
Internal nodes combine features in quite simple ways, particularly when number of inputs is small (the meaning in the inputs cannot be distorted/diluted too much in the internal nodes, if you allow).
I think what you are saying is: let’s assume that inputs are not engineered and have no high-level meaning to people (e.g. raw pixel data). But the output does, e.g. it detects cats in pictures. The question is: can we find parts of the model which correspond to some human understandable categories (e.g. ear detector)?
In this case, I agree that seems equally hard regardless of the model, holding complexity constant. I just wouldn’t call hardness of doing this specific thing “uninterpretability”.
[1] Is that definition standard? I’m not a fan, I’d go closer to “interpretable model” = “model humans can reason about, other than by running black-box experiments on the model”.
I’m understanding that by “interpretability” you mean “we can attach values meaningful to people to internal nodes of the model”[1].
My guess is that logical/probabilistic models are regarded as more interpretable than DNNs mostly for two reasons:
They tend to have small number of inputs and the inputs are heavily engineered features (so the inputs themselves are already meaningful to people).
Internal nodes combine features in quite simple ways, particularly when number of inputs is small (the meaning in the inputs cannot be distorted/diluted too much in the internal nodes, if you allow).
I think what you are saying is: let’s assume that inputs are not engineered and have no high-level meaning to people (e.g. raw pixel data). But the output does, e.g. it detects cats in pictures. The question is: can we find parts of the model which correspond to some human understandable categories (e.g. ear detector)?
In this case, I agree that seems equally hard regardless of the model, holding complexity constant. I just wouldn’t call hardness of doing this specific thing “uninterpretability”.
[1] Is that definition standard? I’m not a fan, I’d go closer to “interpretable model” = “model humans can reason about, other than by running black-box experiments on the model”.