Idea: Network modularity and interpretability by sexual reproduction
Here’s the idea: what if we treat each layer of GPT as a chromosome, and trained multiple (randomly initialized) networks in parallel, from time to time randomly exchanging n-th level between some two of them?
The goal here is to achieve a network which is easier to interpret.
The problem which this tries to solve is that huge optimization pressure can incentivize the network to use hacky, non-modular, spaghetti-like solution with not clear-cut/superimposed concepts, and opportunistically use whatever trick gets the job done.
The means by which it tries to solve it is by creating an environment in which non-local dependence on peculiar implementation details of other parts of the system can’t be exploited as they might get replaced by a different implementation at any moment.
The hope is that some form of interface/contract for n-th layer would emerge as something neighboring layers can depend on. This in turn might be easier to interpret.
On a more abstract level: the idea is to create a training environment which makes the architectural boundaries relevant to designers (here: boundaries of layers) coincide with boundaries at which the environment cuts/rearranges the system being optimized (here: boundaries of chromosomes exchanged between individuals), so that in response the system is incentivized to recognize these boundaries as special—as an interface which needs to have a “well defined”/”fixed” language/semantic/protocol/signature. By “more abstrac” I mean that even if “exchange randomly layers of two GPTs” turns out to be stupid idea, then maybe there’s some other boundary which would make sense (individual attention heads? at neuron level? this might be similar to drop-out?) or maybe some other mechanism of cutting/rearranging (cross-over: take prefix of n layers from one, and the rest from other network? deletion: completely remove?).
Geoff Hinton invented dropout, which randomly removes certain parameters from a network in each feedforward pass during training, based on a similar analogy to sexual recombination. It worked reasonably well in its time but has since fallen out of favor. I’m not sure if anyone has looked into the relative interpretability of nets trained with dropout VS those without.
Fwiw, dropout hasn’t fallen out of favor very much.
I think dropout makes nets less interpretable (wrt. naive interp strats). This is based on my recollection, I forget what exact experiments we have and haven’t run.
OK, good to know. I had a cached belief that it had declined in popularity which probably exaggerated the extent.