Daniel Kokotajlo comments on Visible loss landscape basins don’t correspond to distinct algorithms

Daniel Kokotajlo 29 Jul 2023 4:17 UTC
11 points
3
Thanks for this post. It failed to dislodge any dogmas in me because I didn’t subscribe to the ones you attacked—so here are my dogmas, maybe they are under-the-surface-similar and you can attack them too?-

-Randomly initialized neural networks of size N are basically a big grab bag of random subnetworks of size <N—
Training tends to simultaneously modify all the subnetworks at once, in a sort of evolutionary process—subnetworks that contributed to success get strengthened and tweaked, and subnetworks that contribute to failure get weakened.-
-Eventually you have a network that performs very well in training—which probably means that it has at least one and possibly several subnetworks that perform well in training. This explains why you can usually prune away neurons without much performance degradation, and also why neural nets are so robust to small amounts of damage.--
Networks that perform well in training tend to also be capable in other nearby environments as well (“generalization”) because (a) the gods love simplicity, and made our universe such that simple patterns are ubiquitous, and (b) simpler algorithms occupy more phase space in a neural net (there are more possible settings of the parameters that implement simpler algorithms), so (conclusion) a trained neural network tends to do well in training by virtue of subnetworks that implement simple algorithms that match the patterns inherent in the training environment, and often these patterns are found outside the training environment (in ‘nearby’ or ‘similar’ environments) also, so often the trained neural networks “generalize.”—S
o why grokking? Well, sometimes there is a simple algorithm (e.g. correct modular arithmetic) that requires getting a lot of fiddly details right, and a more complex algorithm (e.g. memorizing a look-up table) that is modular and very easy to build up piece by piece. In these cases, both algorithms get randomly initialized in various subnetworks, but the simple ones are ‘broken’ and need ‘repair’ so to speak, and that takes a while because for the whole to work all the parts need to be just so, whereas the complex but modular ones can very quickly be hammered into shape since some parts being out of shape reduces performance only partially. Thus, after training long enough, the simple algorithm subnetworks finally get into shape and then come to dominate behavior (because they are simpler & therefore more numerous).
What links here?
- Daniel Kokotajlo's comment on Daniel Kokotajlo’s Shortform by Daniel Kokotajlo (2 Aug 2023 17:03 UTC; 2 points)