Why do you think that GPT-3 has untied embeddings?
Personal correspondance with someone who worked on it.
Why do you think that GPT-3 has untied embeddings?
Personal correspondance with someone who worked on it.