Tom Lieberum comments on We Found An Neuron in GPT-2

Tom Lieberum 12 Feb 2023 15:09 UTC
5 points
0
Nice work, thanks for sharing! I really like the fact that the neurons seem to upweight different versions of the same token (_an, _An, an, An, etc.). It’s curious because the semantics of these tokens can be quite different (compared to the though, tho, however neuron).
Have you looked at all into what parts of the model feed into (some of) the cleanly associated neurons? It was probably out of scope for this but just curious.
- Logan Riggs 12 Feb 2023 17:25 UTC
  3 points
  0
  Parent
  One reason the neuron is congruent with multiple of the same tokens may be because those token embeddings are similar (you can test this by checking their cosine similarities).
  - Tom Lieberum 12 Feb 2023 21:41 UTC
    1 point
    0
    Parent
    Yup! I think that’d be quite interesting. Is there any work on characterizing the embedding space of GPT2?
    - LawrenceC 13 Feb 2023 20:08 UTC
      4 points
      0
      Parent
      Adam Scherlis did some preliminary exploration here:
      https://www.lesswrong.com/posts/BMghmAxYxeSdAteDc/an-exploration-of-gpt-2-s-embedding-weights
      
      Here’s a more thorough investigation of the overall shape of said embeddings with interactive figures:
      https://bert-vs-gpt2.dbvis.de/
    - LawrenceC 13 Feb 2023 21:45 UTC
      3 points
      0
      Parent
      There’s also a lot of academic work on the geometry of LM embeddings, e.g.:
      https://openreview.net/forum?id=xYGNO86OWDH (BERT, ERNIE)
      https://arxiv.org/abs/2209.02535 (GPT-2-medium)
      (Plus a mountain more on earlier text/token embeddings like Word2Vec.)
    - Butanium 13 Feb 2023 13:37 UTC
      1 point
      0
      Parent
      https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompt-generation is related to the embedding space
- Joseph Miller 13 Feb 2023 19:23 UTC
  2 points
  0
  Parent
  Have you looked at all into what parts of the model feed into (some of) the cleanly associated neurons? It was probably out of scope for this but just curious.
  We did look very briefly at this for the " an" neuron. We plotted the residual stream congruence with the neuron input weights throughout the model. The second figure shows the difference from each layer.
  Unfortunately I can’t seem to comment an image. See it here.
  We can’t tell that much from this but I think there are three takeaways:
  1. The model doesn’t start ‘preparing’ to activate the " an" neuron until layer 16.
  2. No single layer stands out a lot as being particularly responsible for the " an" neuron’s activation (which is part of why we didn’t investigate this further).
  3. The congruence increases a lot after MLP 31. This means the output of layer 31 is very congruent with the input weights of the " an" neuron (which is in MLP 31). I this this is almost entirely the effect of the " an" neuron, partly because the input of the " an" neuron is very congruent with the " an" token (although not as much as the neuron output weights). This makes me think that this neuron is at least partly a ‘signal boosting’ neuron.