But you wouldn’t study … MNIST-classifier CNNs circa 2010s, and claim that your findings generalize to how LLMs circa 2020s work.
This particular bit seems wrong; CNNs and LLMs are both built on neural networks. If the findings don’t generalize, that could be called a “failure of theory”, not an impossibility thereof. (Then again, maybe humans don’t have good setups for going 20 steps ahead of data when building theory, so...)
(To clarify, this post is good and needed, so thank you for writing it.)
This particular bit seems wrong; CNNs and LLMs are both built on neural networks. If the findings don’t generalize, that could be called a “failure of theory”, not an impossibility thereof. (Then again, maybe humans don’t have good setups for going 20 steps ahead of data when building theory, so...)
(To clarify, this post is good and needed, so thank you for writing it.)
Yep, there’s nonzero mutual information. But not of the sort that’s centrally relevant.
I’ll link to this reply in lieu of just copying it.