On why neural networks generalize, it’s known that part of the answer is: They don’t generalize nearly as much as people think they do, and there are some fairly important limitations to their generalizability:
Faith and Fate is the paper I’d read, but I think there are other results, like Neural Networks and the Chomsky Hierarchy, or Transformers can’t learn to solve problems recursively, but point is that neural networks are quite a bit overhyped in their ability to generalize from certain data, so some of the answer is they don’t generalize as much as people think:
Crosspost from this post: https://www.lesswrong.com/posts/uG7oJkyLBHEw3MYpT/generalization-from-thermodynamics-to-statistical-physics#
On why neural networks generalize, it’s known that part of the answer is: They don’t generalize nearly as much as people think they do, and there are some fairly important limitations to their generalizability:
Faith and Fate is the paper I’d read, but I think there are other results, like Neural Networks and the Chomsky Hierarchy, or Transformers can’t learn to solve problems recursively, but point is that neural networks are quite a bit overhyped in their ability to generalize from certain data, so some of the answer is they don’t generalize as much as people think:
https://arxiv.org/abs/2305.18654