MadHatter comments on Mechanistic Interpretability for the MLP Layers (rough early thoughts)

MadHatter 24 Dec 2021 21:14 UTC
1 point
Thanks! Enjoy your holidays!
- Well now I feel kind of dumb (for misremembering how LayerNorm works). I’ve actually spent the past day since making the video wondering why information leakage of the form you describe doesn’t occur in most transformers, so it’s honestly kind of a relief to realize this.
- It seems to me that ReLU is a reasonable approximation of GELU, even for networks that are actually using GELU. So one can think about the $G E L U (x) = x Φ (x)$ as just having a slightly messy mask function ( $Φ (x)$ ) that is sort-of-well-approximated by the ReLU binary mask function.