Noosphere89 comments on Generalization, from thermodynamics to statistical physics

Noosphere89 4 Dec 2023 18:26 UTC
4 points
0
On why neural networks generalize, it’s known that part of the answer is: They don’t generalize nearly as much as people think they do, and there are some fairly important limitations to their generalizability:

Faith and Fate is the paper I’d read, but I think there are other results, like Neural Networks and the Chomsky Hierarchy, or Transformers can’t learn to solve problems recursively, but point is that neural networks are quite a bit overhyped in their ability to generalize from certain data, so some of the answer is they don’t generalize as much as people think:

https://arxiv.org/abs/2305.18654
- Zach Furman 4 Dec 2023 18:46 UTC
  3 points
  0
  Parent
  It’s worth noting that Jesse is mostly following the traditional “approximation, generalization, optimization” error decomposition from learning theory here—where “generalization” specifically refers to finite-sample generalization (gap between train/test loss), rather than something like OOD generalization. So e.g. a failure of transformers to solve recursive problems would be a failure of approximation, rather than a failure of generalization. Unless I misunderstood you?
  - Noosphere89 4 Dec 2023 18:48 UTC
    5 points
    0
    Parent
    Ok, I understand now. You haven’t misunderstood me. I’m not sure what to do with my comment above now.
    - Jesse Hoogland 4 Dec 2023 21:17 UTC
      2 points
      0
      Parent
      Thanks for raising that, it’s a good point. I’d appreciate it if you also cross-posted this to the approximation post here.
      - Noosphere89 4 Dec 2023 21:48 UTC
        2 points
        0
        Parent
        I’ll cross post it soon.
        
        I actually did it: https://www.lesswrong.com/posts/gq9GR6duzcuxyxZtD/?commentId=feuGTuRRAi6r6DRRK