So we should expect the first layer to have a larger norm than preceding layers.
you mean the final layer?
Yes.
you mean the final layer?
Yes.