Thanks for your hard work. I wonder why in the layer 0 attention head, the positions of the query and value are 1?
Hi, sorry for the late response! The layer 0 attention head should have query at position 1, and value at position 0 (same as key). Which diagram are you referring to?
Thanks for your hard work. I wonder why in the layer 0 attention head, the positions of the query and value are 1?
Hi, sorry for the late response! The layer 0 attention head should have query at position 1, and value at position 0 (same as key). Which diagram are you referring to?