You should also set model.cfg.normalization_type = None afterwards. It’s mostly a formality since you’re doing it after initialization.ActivationCache.apply_ln_to_stack() is the only function I found which behaves incorrectly if you don’t change this.
And here’s the code to do it with replacing the LayerNorms with identities completely:
You should also set
model.cfg.normalization_type = None
afterwards. It’s mostly a formality since you’re doing it after initialization.ActivationCache.apply_ln_to_stack()
is the only function I found which behaves incorrectly if you don’t change this.Thanks! I’ll edit it
And here’s the code to convert it to NNsight (Thanks Caden for writing this awhile ago!)