I think calling the “simple algorithm” that Lucius is talking about “interpretability” might be somewhat misleading.
It seems that Lucius is talking about this idea of the simplest ontology/generative model/structure of inference and learning that, if fed with actual sensory data, attached to actuators, and given access to some kind of memory and lookup, yields an intelligence of the level of generality at least as broad as human’s.
Many people have conjectured that such an “algorithm” does exist and is in the order of 100,000 bytes in size. At least I’ve heard Carmack, Cristiano, and Leahy saying something along these lines, but probably there are many more people.
However, there are mostly no such constraints in ANN training (by default), so it doesn’t seem destined to me that LLM behaviour should “compress” very much, i.e., a “simple algorithm” being abstractable from it (other than for a speculative reason that it’s mostly trained on human language which somehow induces the transfer of human’s structure into LLM; more on this below).
To induce ANNs to learn “simple algorithms”, extra regularisations/penalties/pressures/learning priors are applied during training, such as this or this.
It seems plausible though that LLMs impart some “simple algorithm” (although, unlikely exactly as simple and as abstract as the algorithm in the human brain itself) in virtue of being trained on human texts that are already regularised in a specific way. Then, reverse-engineering this algorithm from LLMs could be seen as a kind of circuitous way of “solving neuroscience”, thanks to the transparency of ANNs and the ease of conducting experiments. I think this agenda has promise, but I’d speculate the resulting algorithms will be sufficiently rough approximations of the behaviour of LLMs that calling this agenda a kind of “interpretability” would be misleading.
However, there are mostly no such constraints in ANN training (by default), so it doesn’t seem destined to me that LLM behaviour should “compress” very much
The point of the Singular Learning Theory digression was to help make legible why I think this is importantly false. NN training has a strong simplicity bias, basically regardless of the optimizer used for training, and even in the absence of any explicit regularisation. This bias towards compression is a result of the particular degenerate structure of NN loss landscapes, which are in turn a result of the NN architectures themselves. Simpler solutions in these loss landscapes have a lower “learning coefficient,” which you might conceptualize as an “effective” parameter count, meaning they occupy more (or higher dimensional, in the idealized case) volume in the loss landscape than more complicated solutions with higher learning coefficients.
This bias in the loss landscapes isn’t quite about simplicity alone. It might perhaps be thought of as a particular mix of a simplicity prior, and a peculiar kind of speed prior.
That is why Deep Learning works in the first place. That is why NN training can readily yield solutions that generalize far past the training data, even when you have substantially more parameters than data points to fit on. That is why, with a bit of fiddling around, training a transformer can get you a language model, whereas training a giant polynomial on predicting internet text will not get you a program that can talk. SGD or no SGD, momentum or no momentum, weight regularisation or no weight regularisation. Because polynomial loss landscapes do not look like NN loss landscapes.
I agree with you, but it’s not clear that in lieu of explicit regularisation, DNNs, in particular LLMs, will compress to the degree that they become intelligible (interpretable) to humans. That is, their effective dimensionality might be reduced from 1T to 100M or whatever, but that would be still way too much for humans to comprehend. Explicit regularisation drives this effective dimensionality down.
I think calling the “simple algorithm” that Lucius is talking about “interpretability” might be somewhat misleading.
It seems that Lucius is talking about this idea of the simplest ontology/generative model/structure of inference and learning that, if fed with actual sensory data, attached to actuators, and given access to some kind of memory and lookup, yields an intelligence of the level of generality at least as broad as human’s.
Many people have conjectured that such an “algorithm” does exist and is in the order of 100,000 bytes in size. At least I’ve heard Carmack, Cristiano, and Leahy saying something along these lines, but probably there are many more people.
Bayesian Brain theorists further hypothesise that animal brains do effectively implement something like these “simple” algorithms (adjusted to the level of generality and sophistication of the world model each animal species needs) due to the strong evolutionary pressure on energy efficiency of the brain (“The free energy principle induces neuromorphic development”). The speed-accuracy tradeoffs in brain hardware add another kind of pressure that points in the same direction (“Internal feedback in the cortical perception–action loop enables fast and accurate behavior”).
However, there are mostly no such constraints in ANN training (by default), so it doesn’t seem destined to me that LLM behaviour should “compress” very much, i.e., a “simple algorithm” being abstractable from it (other than for a speculative reason that it’s mostly trained on human language which somehow induces the transfer of human’s structure into LLM; more on this below).
To induce ANNs to learn “simple algorithms”, extra regularisations/penalties/pressures/learning priors are applied during training, such as this or this.
It seems plausible though that LLMs impart some “simple algorithm” (although, unlikely exactly as simple and as abstract as the algorithm in the human brain itself) in virtue of being trained on human texts that are already regularised in a specific way. Then, reverse-engineering this algorithm from LLMs could be seen as a kind of circuitous way of “solving neuroscience”, thanks to the transparency of ANNs and the ease of conducting experiments. I think this agenda has promise, but I’d speculate the resulting algorithms will be sufficiently rough approximations of the behaviour of LLMs that calling this agenda a kind of “interpretability” would be misleading.
The point of the Singular Learning Theory digression was to help make legible why I think this is importantly false. NN training has a strong simplicity bias, basically regardless of the optimizer used for training, and even in the absence of any explicit regularisation. This bias towards compression is a result of the particular degenerate structure of NN loss landscapes, which are in turn a result of the NN architectures themselves. Simpler solutions in these loss landscapes have a lower “learning coefficient,” which you might conceptualize as an “effective” parameter count, meaning they occupy more (or higher dimensional, in the idealized case) volume in the loss landscape than more complicated solutions with higher learning coefficients.
This bias in the loss landscapes isn’t quite about simplicity alone. It might perhaps be thought of as a particular mix of a simplicity prior, and a peculiar kind of speed prior.
That is why Deep Learning works in the first place. That is why NN training can readily yield solutions that generalize far past the training data, even when you have substantially more parameters than data points to fit on. That is why, with a bit of fiddling around, training a transformer can get you a language model, whereas training a giant polynomial on predicting internet text will not get you a program that can talk. SGD or no SGD, momentum or no momentum, weight regularisation or no weight regularisation. Because polynomial loss landscapes do not look like NN loss landscapes.
I agree with you, but it’s not clear that in lieu of explicit regularisation, DNNs, in particular LLMs, will compress to the degree that they become intelligible (interpretable) to humans. That is, their effective dimensionality might be reduced from 1T to 100M or whatever, but that would be still way too much for humans to comprehend. Explicit regularisation drives this effective dimensionality down.