Yeah I agree with everything you say; it’s just I was trying to remind myself of enough of SLT to give a a ‘five minute pitch’ for SLT to other people, and I didn’t like the idea that I’m hanging it of the ReLU.
I guess the intuition behind the hierarchical nature of the models leading to singularities is the permutation symmetry between the hidden channels, which is kind of an easy thing to understand.
I get and agree with your point about approximate equivalences, though I have to say that I think we should be careful! One reason I’m interested in SLT is I spent a lot of time during my PhD on Bayesian approximations to NN posteriors. I think SLT is one reasonable explanation of why this. never yielded great results, but I think hand-wavy intuitions about ‘oh well the posterior is probably-sorta-gaussian’ played a big role in it’s longevity as an idea.
yeah it’s not totally clear what this ‘nearly singular’ thing would mean? Intuitively, it might be that there’s a kind of ‘hidden singularity’ in the space of this model that might affect the behaviour, like the singularity in a dynamic model with a phase transition. but im just guessing
Yeah I agree with everything you say; it’s just I was trying to remind myself of enough of SLT to give a a ‘five minute pitch’ for SLT to other people, and I didn’t like the idea that I’m hanging it of the ReLU.
I guess the intuition behind the hierarchical nature of the models leading to singularities is the permutation symmetry between the hidden channels, which is kind of an easy thing to understand.
I get and agree with your point about approximate equivalences, though I have to say that I think we should be careful! One reason I’m interested in SLT is I spent a lot of time during my PhD on Bayesian approximations to NN posteriors. I think SLT is one reasonable explanation of why this. never yielded great results, but I think hand-wavy intuitions about ‘oh well the posterior is probably-sorta-gaussian’ played a big role in it’s longevity as an idea.
yeah it’s not totally clear what this ‘nearly singular’ thing would mean? Intuitively, it might be that there’s a kind of ‘hidden singularity’ in the space of this model that might affect the behaviour, like the singularity in a dynamic model with a phase transition. but im just guessing