Joseph Miller comments on Examining Language Model Performance with Reconstructed Activations using Sparse Autoencoders

Joseph Miller 28 Feb 2024 20:09 UTC
3 points
1
I think it is the same. When training next-token predictors we model the ground truth probability distribution as having probability $1$ for the actual next token and $0$ for all other tokens in the vocab. This is how the cross-entropy loss simplifies to negative log likelihood. You can see that the transformer lens implementation doesn’t match the equation for cross entropy loss because it is using this simplification.
So the missing factor of $p$ would just be $1$ I think.
- Evan Anders 28 Feb 2024 22:00 UTC
  3 points
  0
  Parent
  Oh! You’re right, thanks for walking me through that, I hadn’t appreciated that subtlety. Then in response to the first question: yep! $Δ$ CE = KL Divergence.