One thing I noticed when reflecting on this dialogue later was that I really wasn’t considering the data distribution’s role in creating the loss landscape. So thanks for bringing this up!
Suppose I had some separation of the features of my brain into “parameters” and “activations”. Would my brain be singular if there were multiple values the parameters could take such that for all possible inputs the activations were the same? Or would it have to be that those parameters were also local minima?
(I suppose it’s not that realistic that the activations would be the same for all inputs, even assuming the separation into parameters and activations, because some inputs vaporise my brain)
One thing I noticed when reflecting on this dialogue later was that I really wasn’t considering the data distribution’s role in creating the loss landscape. So thanks for bringing this up!
Suppose I had some separation of the features of my brain into “parameters” and “activations”. Would my brain be singular if there were multiple values the parameters could take such that for all possible inputs the activations were the same? Or would it have to be that those parameters were also local minima?
(I suppose it’s not that realistic that the activations would be the same for all inputs, even assuming the separation into parameters and activations, because some inputs vaporise my brain)