Thane Ruthenis comments on New OpenAI Paper—Language models can explain neurons in language models

Thane Ruthenis 11 May 2023 2:02 UTC
3 points
0
What are the intuitions that you use to feel the default case is the other way around?
You can always factorize the problem into smaller pieces. If the interlocutor doesn’t understand “A then B” but can understand “A”, “B”, “or”, and “not” individually, you can introduce them to “not(A)”, let them get used to it until they can think of not(A) as a simple assertion C, then introduce them to or(C;B) (which implements the implication “if A then B”). It can be exhausting, but it works.
And in the case of larger AI models, seems like this sort of factorization would be automatic. Their sophistication grows with the number of parameters — which means the complexity of interactions within individual fixed-size groups of parameters can be constant, or even decrease with the model’s size.
Sure, the functions that e. g. parameters at late layers implement may be more complex in an absolute sense; but not more complex relative to lower-layer functions.
Toy example: if every neuron at the nth layer implements an elementary operation over two lower-layer neurons, the function at the 32th layer would be “more complex” than any function at the 6th layer, when considered from scratch — but not more complex if by the time you get to an nth layer, you already understand everything at every preceding layer.