Copying: how much the head output increases the logit of [A] compared to the other logits.
Please correct me if I’m wrong, but I believe you mean [B] here instead of [A]?
You’re right, thanks for spotting it! It’s fixed now.
Please correct me if I’m wrong, but I believe you mean [B] here instead of [A]?
You’re right, thanks for spotting it! It’s fixed now.