ryan_greenblatt comments on Basic Facts about Language Model Internals

ryan_greenblatt 4 Jan 2023 20:34 UTC
LW: 17 AF: 10
6
AF
In prior work I’ve done, I’ve found that activations have tails between $e^{- x^{2}}$ and $e^{- x}$ (typically closer to $e^{- x}$ ). As such, they’re probably better modeled as logistic distributions.

That said, different directions in the residual stream have quite different distributions. This depends considerably on how you select directions—I imagine random directions are more gaussian due to CLT. (Note that averaging together heavier tailed distributions takes a very long time to be become gaussian.) But, if you look at (e.g.) the directions selected by neurons optimized for sparsity, I’ve commonly observed bimodal distributions, heavy skew, etc. My low confidence guess is that this is primarily because various facts about language have these properties and exhibiting this structure in the model is an efficient way to capture this.

This is a broadly similar point to @Fabien Roger.
- Christopher Olah 5 Jan 2023 4:25 UTC
  LW: 10 AF: 5
  1
  AF Parent
  See also the Curve Detectors paper for a very narrow example of this (https://distill.pub/2020/circuits/curve-detectors/#dataset-analysis—a straight line on a log prob plot indicates exponential tails).
  
  I believe the phenomena of neurons often having activation distributions with exponential tails was first informally observed by Brice Menard.
- StellaAthena 4 Jan 2023 21:03 UTC
  1 point
  0
  Parent
  Do you have a reference to the work you’re talking about? I’m doing some stuff involving fitting curves to activation tails currently.
  - ryan_greenblatt 4 Jan 2023 21:26 UTC
    2 points
    0
    Parent
    Unpublished and not written up. Sorry.