I think I remember seeing somewhere that LLMs learn more slowly on languages with “more complex” grammar (in the sense of their loss decreasing more slowly per the same number of tokens) but I can’t find the source right now.
Current theme: default
Less Wrong (text)
Less Wrong (link)
I think I remember seeing somewhere that LLMs learn more slowly on languages with “more complex” grammar (in the sense of their loss decreasing more slowly per the same number of tokens) but I can’t find the source right now.