This is very interesting, thanks for this work!
A clarification I may have missed from your previous posts: what exactly does “attention QKV weight matrix” mean? Is that the concatenation of the Q, K, and V projection matrices, their sum, or something else?
This is very interesting, thanks for this work!
A clarification I may have missed from your previous posts: what exactly does “attention QKV weight matrix” mean? Is that the concatenation of the Q, K, and V projection matrices, their sum, or something else?