I feel like this answer glosses over the fact that the encoding changes. Surely you can find some encodings of instructions such that LLMs cannot follow instructions in that encoding. So the question lies in why learning the English encoding also allows the model to learn (say) German encodings.
So the question lies in why learning the English encoding also allows the model to learn (say) German encodings.
No? We already know that the model can competently respond in German. Once you condition on the model competently responding in other languages (e.g. for translation tasks) there is no special question about why it follows instructions in other languages as well.
Like “why are LLMs capable of translation might be an interesting question”, but if you’re not asking that question, then I don’t understand why you’re asking this.
My position is that this isn’t a special capability that warrants any explanation that isn’t covered in an explanation of why/how LLMs are competent translators.
Ah, I misunderstood the content of original tweet—I didn’t register that the model indeed had access to lots of data in other languages as well. In retrospect I should have been way more shocked if this wasn’t the case. Thanks.
I then agree that it’s not too surprising that the instruction-following behavior is not dependent on language, though it’s certainly interesting. (I agree with Habryka’s response below.)
I feel like this answer glosses over the fact that the encoding changes. Surely you can find some encodings of instructions such that LLMs cannot follow instructions in that encoding. So the question lies in why learning the English encoding also allows the model to learn (say) German encodings.
No? We already know that the model can competently respond in German. Once you condition on the model competently responding in other languages (e.g. for translation tasks) there is no special question about why it follows instructions in other languages as well.
Like “why are LLMs capable of translation might be an interesting question”, but if you’re not asking that question, then I don’t understand why you’re asking this.
My position is that this isn’t a special capability that warrants any explanation that isn’t covered in an explanation of why/how LLMs are competent translators.
Ah, I misunderstood the content of original tweet—I didn’t register that the model indeed had access to lots of data in other languages as well. In retrospect I should have been way more shocked if this wasn’t the case. Thanks.
I then agree that it’s not too surprising that the instruction-following behavior is not dependent on language, though it’s certainly interesting. (I agree with Habryka’s response below.)