jacob_cannell comments on A Mechanistic Interpretability Analysis of Grokking

jacob_cannell 16 Aug 2022 21:17 UTC
4 points
0
A maximally sparse neural net layer (k=1 max, only one neuron active) effectively is just a simple input->output key/value map and thus can only memorize. It can at best learn to associate each input pattern with one specific output pattern, no more, no less (and can perfectly trivial overfit any dataset of N examples by using N neurons and NI + NO memory, just like a map/table in CS).

We can get some trivial compression if there are redundant input-> output mappings, but potentially much larger gains by slowly relaxing that sparsity constraint and allowing more neurons to be simultaneously active to provide more opportunities to compress the function. With k=2 for example and N=D/2, each neuron now responds to exactly two different examples and must share the input->output mapping with one other neuron—by specializing on different common subset patterns for example.

At the extreme of compression we have a neural circuit which computes some algorithm that fits the data well and is likely more dense. In the continuous circuit space there are always interpolations between those circuits and memorization circuits.