If 4 is not simply a bad default, maybe they considered more data with a high inferential distance (foreign, non-natural/​formal languages), which may require more epochs?
If 4 is not simply a bad default, maybe they considered more data with a high inferential distance (foreign, non-natural/​formal languages), which may require more epochs?