Yes, the ULMFiT paper is one of the first papers using the notion of “pretraining” (it might be the one which actually introduces this terminology).
Then it appears in other famous 2018 papers:
Improving Language Understanding by Generative Pre-Training (Radford et al., June 2018)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Yes, the ULMFiT paper is one of the first papers using the notion of “pretraining” (it might be the one which actually introduces this terminology).
Then it appears in other famous 2018 papers:
Improving Language Understanding by Generative Pre-Training (Radford et al., June 2018)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding