Specifically, if for example you vary between two loss functions in some training environment, L1 and L2, that variation is called “modular” if somewhere in design space, that is, the space formed by all possible combinations of parameter values your network can take, you can find a network N1 that “does well”(1) on L1, and a network N2 that “does well” on L2, and these networks have the same values for all their parameters, except for those in a single(2) submodule(3).
It’s often the case that you can implement the desired function with, say, 10% of the parameters that you actually have. So every pair of L1 and L2 would be called “modular”, by changing the 10% of parameters that actually do anything, and leaving the other 90% the same. Possible fixes:
You could imagine that it’s more modular the fewer parameters are needed, so that if you can do it with 1% of the parameters, that’s more modular than 10% of the parameters. Problem: this is probably mostly measuring min(difficulty(L1), difficulty(L2)), where difficulty(L) is the minimum number of parameters needed to “solve” L, for whatever definition of “solve” you are using.
You could have a definition that first throws away all the parameters that are irrelevant, and then applies the definition above. (I expect this to have problems with Goodharting on the definition of “irrelevant”, but it’s not quite so obvious what they will be.)
Yep thanks! I would imagine if progress goes well on describing modularity in an information-theoretic sense, this might help with (2), because information entanglement between a single module and the output would be a good measure of “relevance” in some sense
It’s often the case that you can implement the desired function with, say, 10% of the parameters that you actually have. So every pair of L1 and L2 would be called “modular”, by changing the 10% of parameters that actually do anything, and leaving the other 90% the same. Possible fixes:
You could imagine that it’s more modular the fewer parameters are needed, so that if you can do it with 1% of the parameters, that’s more modular than 10% of the parameters. Problem: this is probably mostly measuring min(difficulty(L1), difficulty(L2)), where difficulty(L) is the minimum number of parameters needed to “solve” L, for whatever definition of “solve” you are using.
You could have a definition that first throws away all the parameters that are irrelevant, and then applies the definition above. (I expect this to have problems with Goodharting on the definition of “irrelevant”, but it’s not quite so obvious what they will be.)
A very good point!
I agree that fix 1. seems bad, and doesn’t capture what we care about.
At first glance, fix 2. seems more promising to me, but I’ll need to think about it.
Thank you very much for pointing this out.
Yep thanks! I would imagine if progress goes well on describing modularity in an information-theoretic sense, this might help with (2), because information entanglement between a single module and the output would be a good measure of “relevance” in some sense