This paper on scaling laws for training language models seems like it should help us make a rough guess for how training scales. According to the paper, your loss in nats if you’re only limited by cost C scales as C−0.05, and if you’re only limited by number of parameters N it scales with N−0.08. If we can equate those in the limit, which is not at all obvious to me, that suggests that cost goes as number of parameters to the 1.6 power, and number of parameters itself is polynomial in the number of neurons. So, the comprehension can be a little polynomial in the number of neurons, but it certainly can’t be exponential.
Yup, that seems like a pretty reasonable estimate to me.
Note that my default model for “what should be the input to estimate difficulty of mechanistic transparency” would be the number of parameters, not number of neurons. If a neuron works over a much larger input (leading to more parameters), wouldn’t that make it harder to mechanistically understand?
This paper on scaling laws for training language models seems like it should help us make a rough guess for how training scales. According to the paper, your loss in nats if you’re only limited by cost C scales as C−0.05, and if you’re only limited by number of parameters N it scales with N−0.08. If we can equate those in the limit, which is not at all obvious to me, that suggests that cost goes as number of parameters to the 1.6 power, and number of parameters itself is polynomial in the number of neurons. So, the comprehension can be a little polynomial in the number of neurons, but it certainly can’t be exponential.
Yup, that seems like a pretty reasonable estimate to me.
Note that my default model for “what should be the input to estimate difficulty of mechanistic transparency” would be the number of parameters, not number of neurons. If a neuron works over a much larger input (leading to more parameters), wouldn’t that make it harder to mechanistically understand?