Lennart Buerger comments on JumpReLU SAEs + Early Access to Gemma 2 SAEs

Lennart Buerger 25 Jul 2024 14:21 UTC
LW: 1 AF: 1
0
AF
Nice work! I was wondering what context length you were using when you extracted the LLM activations to train the SAE. I could not find it in the paper but I might also have missed it. I know that OpenAI used a context length of 64 tokens in all their experiments which is probably not sufficient to elicit many interesting features. Do you use a variable context length or also a fixed value?
- Tom Lieberum 25 Jul 2024 15:56 UTC
  1 point
  0
  Parent
  We use 1024, though often article snippets are shorter than that so they are separated by BOS.