Hmm. I agree that this weakens the convergent natural abstractions hypothesis, but I also think that there is such a thing as slicing reality at the mechanism joints in your model, and that, while whether a learning system converges on those is a property that must be explicitly optimized in the design of the system in order for that convergence to occur, it still seems to me appropriate to say that the model almost learned a natural abstraction but had adversarial example issues due to not being robust. Approaches to robustifying a neural network seem likely to me to significantly increase the degree to which the network learns the natural abstraction.
I’ve inverted my self vote on my original comment because of becoming uncertain.
Hmm. I agree that this weakens the convergent natural abstractions hypothesis, but I also think that there is such a thing as slicing reality at the mechanism joints in your model, and that, while whether a learning system converges on those is a property that must be explicitly optimized in the design of the system in order for that convergence to occur, it still seems to me appropriate to say that the model almost learned a natural abstraction but had adversarial example issues due to not being robust. Approaches to robustifying a neural network seem likely to me to significantly increase the degree to which the network learns the natural abstraction.
I’ve inverted my self vote on my original comment because of becoming uncertain.