Logan Riggs comments on A List of 45+ Mech Interp Project Ideas from Apollo Research’s Interpretability Team

Logan Riggs 5 Sep 2024 17:23 UTC
LW: 2 AF: 1
0
AF
Some MLPs or attention layers may implement a simple linear transformation in addition to actual computation.
@Lucius Bushnaq , why would MLPs compute linear transformations?
Because two linear transformations can be combined into one linear transformation, why wouldn’t downstream MLPs/Attns that rely on this linearly transformed vector just learn the combined function?