I think the key point here is that we’re applying a linear transformation to move from neuron space into feature space. Sometimes neurons and features do coincide and you can actually attribute particular concepts to neurons, but unless the neurons are a privileged basis there’s no reason to expect this in general. We’re taking the definition of feature here as a linear combination of neurons which represents some particular important and meaningful (and hopefully human-comprehensible) concept.
I think the key point here is that we’re applying a linear transformation to move from neuron space into feature space. Sometimes neurons and features do coincide and you can actually attribute particular concepts to neurons, but unless the neurons are a privileged basis there’s no reason to expect this in general. We’re taking the definition of feature here as a linear combination of neurons which represents some particular important and meaningful (and hopefully human-comprehensible) concept.