I just stumbled on this post and wanted to note that very closely related ideas are sometimes discussed in interpretability under the names “universality” or “convergent learning”: https://distill.pub/2020/circuits/zoom-in/#claim-3
In fact, not only do the same features form across different neural networks, but we actually observe the same circuits as well (eg. https://distill.pub/2020/circuits/frequency-edges/#universality ).
some more relevant discussion here on abstraction in ML: https://www.deepmind.com/publications/abstraction-for-deep-reinforcement-learning
I just stumbled on this post and wanted to note that very closely related ideas are sometimes discussed in interpretability under the names “universality” or “convergent learning”: https://distill.pub/2020/circuits/zoom-in/#claim-3
In fact, not only do the same features form across different neural networks, but we actually observe the same circuits as well (eg. https://distill.pub/2020/circuits/frequency-edges/#universality ).
some more relevant discussion here on abstraction in ML: https://www.deepmind.com/publications/abstraction-for-deep-reinforcement-learning