Joseph Miller comments on Fact Finding: Simplifying the Circuit (Post 2)

Joseph Miller 17 Jan 2024 0:38 UTC
LW: 1 AF: 1
0
AF
What’s up with the <pad> token (<pad>==<bos>==<eos> in Pythia) in the attention diagram? I assume that doesn’t need to be there?
- Neel Nanda 17 Jan 2024 23:29 UTC
  LW: 2 AF: 2
  0
  AF Parent
  I’m not sure! My guess is that it’s because some athlete names were two tokens and others were three tokens (or longer) and we left padded so all prompts were the same length (and masked the attention so it couldn’t attend to the padding tokens). We definitely didn’t need to do this, and could have just filtered for two token names, it’s not an important detail.
  - Joseph Miller 18 Jan 2024 14:25 UTC
    1 point
    0
    Parent
    Ok thanks!