so that each neuron in the original MLP is simulated by two or three very similar neurons.
If you were using any form of weight decay, this is to be expected.
If you were using any form of weight decay, this is to be expected.