Note that in the SLT setting, “brains” or “neural networks” are not the sorts of things that can be singular (or really, have a certain λ) on their own—instead they’re singular for certain distributions of data.
This is a good point I often see neglected. Though there’s some sense in which a model p(x|w) can “be singular” independent of data: if the parameter-to-function map w↦p(x|w) is not locally injective. Then, if a distribution p(x) minimizes the loss, the preimage of p(x) in parameter space can have non-trivial geometry.
These are called “degeneracies,” and they can be understood for a particular model without talking about data. Though the actual p(x) that minimizes the loss is determined by data, so it’s sort of like the “menu” of degeneracies are data-independent, and the data “selects one off the menu.” Degeneracies imply singularities, but not necessarily vice-versa, so they aren’t everything. But we do think that degeneracies will be fairly important in practice.
This is a good point I often see neglected. Though there’s some sense in which a model p(x|w) can “be singular” independent of data: if the parameter-to-function map w↦p(x|w) is not locally injective. Then, if a distribution p(x) minimizes the loss, the preimage of p(x) in parameter space can have non-trivial geometry.
These are called “degeneracies,” and they can be understood for a particular model without talking about data. Though the actual p(x) that minimizes the loss is determined by data, so it’s sort of like the “menu” of degeneracies are data-independent, and the data “selects one off the menu.” Degeneracies imply singularities, but not necessarily vice-versa, so they aren’t everything. But we do think that degeneracies will be fairly important in practice.