I was also thinking about Translate—another example from them is that in some languages, our first shot at using Transformers would sometimes translate queries as “wikipedia wikipedia wikipedia”, just a list of that word in some number, I guess because it’s a super common word that shows up in web text. It would get stuck where “wikipedia” was always the most likely next token.
I also haven’t heard a good theory about what exactly is going wrong there.
I was also thinking about Translate—another example from them is that in some languages, our first shot at using Transformers would sometimes translate queries as “wikipedia wikipedia wikipedia”, just a list of that word in some number, I guess because it’s a super common word that shows up in web text. It would get stuck where “wikipedia” was always the most likely next token.
I also haven’t heard a good theory about what exactly is going wrong there.