Another area which I think is promising is the study of random networks and random features. The results in this post suggest that training a neural network is functionally similar to randomly initialising it until you find a function that fits the training data. This suggests that we may be able to draw conclusions about what kinds of features a neural network is likely to learn, based on what kinds of features are likely to be created randomly.
It might be that you find statistical mechanics a useful way to analyse this behaviour of NNs, please check out Vanchurin’s statistical mechanics theory of machine learning.
Will do, thank you for the reference!