Nina Panickssery comments on Decoding intermediate activations in llama-2-7b

Nina Panickssery 10 Jan 2024 20:41 UTC
8 points
0
The wrapper modules simply wrap existing submodules of the model, and call whatever they are wrapping (in this case self.attn) with the same arguments, and then save some state / do some manipulation of the output. It’s just the syntax I chose to use to be able to both save state from submodules, and manipulate the values of some intermediate state. If you want to see exactly how that submodule is being called, you can look at the llama huggingface source code. In the code you gave, I am adding some vector to the hidden_states returned by that attention submodule.
- amirrahnama 11 Jan 2024 10:50 UTC
  1 point
  0
  Parent
  Thanks, Nina, for sharing the forward pass of Hugging face. I now realize I was skipping the input layer norm calculations. Now, I can reproduce your numbers :)