Note that in the SLT setting, “brains” or “neural networks” are not the sorts of things that can be singular (or really, have a certain λ) on their own—instead they’re singular for certain distributions of data. So the question is whether brains are singular on real-world data. This matters: e.g. neural networks are more singular on some data (for example, data generated by a thinner neural network) than on others. [EDIT: I’m right about the RLCT but wrong about what ‘being singular’ means, my apologies.]
Anyway here’s roughly how you could tell the answer: if your brain were “optimal” on the data it saw, how many different ways would there be of continuously perturbing your brain such that it were still optimal? The more ways, the more singular you are.
Singularity is actually a property of the parameter function map, not the data distribution. The RLCT is defined in terms of the loss function/reward and the parameter function map. See definition 1.7 of the grey book for the definition of singular, strictly singular, and regular models.
Edit: To clarify, you do need the loss function & a set of data (or in the case of RL and the human brain, the reward signals) in order to talk about the singularities of a parameter-function map, and to calculate the RLCT. You just don’t need them to make the statement that the parameter-function map is strictly singular.
One thing I noticed when reflecting on this dialogue later was that I really wasn’t considering the data distribution’s role in creating the loss landscape. So thanks for bringing this up!
Suppose I had some separation of the features of my brain into “parameters” and “activations”. Would my brain be singular if there were multiple values the parameters could take such that for all possible inputs the activations were the same? Or would it have to be that those parameters were also local minima?
(I suppose it’s not that realistic that the activations would be the same for all inputs, even assuming the separation into parameters and activations, because some inputs vaporise my brain)
Note that in the SLT setting, “brains” or “neural networks” are not the sorts of things that can be singular (or really, have a certain λ) on their own—instead they’re singular for certain distributions of data.
This is a good point I often see neglected. Though there’s some sense in which a model p(x|w) can “be singular” independent of data: if the parameter-to-function map w↦p(x|w) is not locally injective. Then, if a distribution p(x) minimizes the loss, the preimage of p(x) in parameter space can have non-trivial geometry.
These are called “degeneracies,” and they can be understood for a particular model without talking about data. Though the actual p(x) that minimizes the loss is determined by data, so it’s sort of like the “menu” of degeneracies are data-independent, and the data “selects one off the menu.” Degeneracies imply singularities, but not necessarily vice-versa, so they aren’t everything. But we do think that degeneracies will be fairly important in practice.
Note that in the SLT setting, “brains” or “neural networks” are not the sorts of things that can be singular (or really, have a certain λ) on their own—instead they’re singular for certain distributions of data. So the question is whether brains are singular on real-world data. This matters: e.g. neural networks are more singular on some data (for example, data generated by a thinner neural network) than on others. [EDIT: I’m right about the RLCT but wrong about what ‘being singular’ means, my apologies.]
Anyway here’s roughly how you could tell the answer: if your brain were “optimal” on the data it saw, how many different ways would there be of continuously perturbing your brain such that it were still optimal? The more ways, the more singular you are.
Singularity is actually a property of the parameter function map, not the data distribution. The RLCT is defined in terms of the loss function/reward and the parameter function map. See definition 1.7 of the grey book for the definition of singular, strictly singular, and regular models.
Edit: To clarify, you do need the loss function & a set of data (or in the case of RL and the human brain, the reward signals) in order to talk about the singularities of a parameter-function map, and to calculate the RLCT. You just don’t need them to make the statement that the parameter-function map is strictly singular.
Oops, you’re entirely right.
One thing I noticed when reflecting on this dialogue later was that I really wasn’t considering the data distribution’s role in creating the loss landscape. So thanks for bringing this up!
Suppose I had some separation of the features of my brain into “parameters” and “activations”. Would my brain be singular if there were multiple values the parameters could take such that for all possible inputs the activations were the same? Or would it have to be that those parameters were also local minima?
(I suppose it’s not that realistic that the activations would be the same for all inputs, even assuming the separation into parameters and activations, because some inputs vaporise my brain)
This is a good point I often see neglected. Though there’s some sense in which a model p(x|w) can “be singular” independent of data: if the parameter-to-function map w↦p(x|w) is not locally injective. Then, if a distribution p(x) minimizes the loss, the preimage of p(x) in parameter space can have non-trivial geometry.
These are called “degeneracies,” and they can be understood for a particular model without talking about data. Though the actual p(x) that minimizes the loss is determined by data, so it’s sort of like the “menu” of degeneracies are data-independent, and the data “selects one off the menu.” Degeneracies imply singularities, but not necessarily vice-versa, so they aren’t everything. But we do think that degeneracies will be fairly important in practice.