DanielFilan comments on My Criticism of Singular Learning Theory

DanielFilan 19 Nov 2023 18:11 UTC
LW: 7 AF: 3
4
AF
One argument sketch using SLT that NNs are biased towards low complexity solutions: suppose reality is generated by a width 3 network, and you’re modelling it with a width 4 network. Then, along with the generic symmetries, optional solutions also have continuous symmetries where you can switch which neuron is turned off.

Roughly, say neurons 3 and 4 have the same input weight vectors (so their activations are the same), but neuron 4′s output weight vector is all zeros. Then you can continuously scale up the output vector of neuron 4 while simultaneously scaling down the output vector of neuron 3 to leave the network computing the same function. Also, when neuron 4 has zero weights as inputs and outputs you can arbitrarily change the inputs or the outputs but not both.

Anyway, this means that when the data is generated by a slim neural net, optimal nets will have a good RLCT, but when it’s generated by a neural net of the right width, optimal nets will have a bad RLCT. So nets can learn simple data, and it’s easier for them to learn simple data than complex data—assuming thin neural nets count as simple.

This is basically a justification of something like your point 1, but AFAICT it’s closer to a proof in the SLT setting than in your setting.
- Joar Skalse 20 Nov 2023 10:51 UTC
  LW: 1 AF: 1
  0
  AF Parent
  Does this not essentially amount to just assuming that the inductive bias of neural networks in fact matches the prior that we (as humans) have about the world?
  This is basically a justification of something like your point 1, but AFAICT it’s closer to a proof in the SLT setting than in your setting.
  I think it could probably be turned into a proof in either setting, at least if we are allowed to help ourselves to assumptions like “the ground truth function is generated by a small neural net” and “learning is done in a Bayesian way”, etc.
  - DanielFilan 20 Nov 2023 18:46 UTC
    LW: 2 AF: 2
    0
    AF Parent
    
    Does this not essentially amount to just assuming that the inductive bias of neural networks in fact matches the prior that we (as humans) have about the world?
    
    No? It amounts to assuming that smaller neural networks are a better match for the actual data generating process of the world.
    - Joar Skalse 21 Nov 2023 1:02 UTC
      LW: 1 AF: 1
      0
      AF Parent
      The assumption that small neural networks are a good match for the actual data generating process of the world, is equivalent to the assumption that neural networks have an inductive bias that gives large weight to the actual data generating process of the world, if we also append the claim that neural networks have an inductive bias that gives large weight to functions which can be described by small neural networks (and this latter claim is not too difficult to justify, I think).