Thanks for your hard work. I wonder why in the layer 0 attention head, the positions of the query and value are 1?
Thanks for your hard work. I wonder why in the layer 0 attention head, the positions of the query and value are 1?