I’m working on reproducing these results on Llama-2-70b. Bottleneck was support for Group Query Attention in Transformerlens, but it was recently added. Expecting to be done by January 31st.
Done (as of around 2 weeks ago)
I’m working on reproducing these results on Llama-2-70b. Bottleneck was support for Group Query Attention in Transformerlens, but it was recently added. Expecting to be done by January 31st.
Done (as of around 2 weeks ago)