The datasets it was trained on include Wikipedia (English), Common Web Crawl (basically a subset of the Internet), Github, among others.
A team of researchers from OpenAI recently published a paper describing GPT-3, a deep-learning model for natural-language with 175 billion parameters, 100x more than the previous version, GPT-2. The model is pre-trained on nearly half a trillion words and achieves state-of-the-art performance on several NLP benchmarks without fine-tuning.
In paper published on arXiv, a team of over 30 co-authors described the model and several experiments. The researchers’ goal was to produce an NLP system that performs well on a variety of tasks with little or no fine-tuning, and previous work had indicated that larger models might be the solution. To test that hypothesis, the team increased the size of their previous model, GPT-2, from 1.5 billion parameters to 175 billion. For training, the team collected several datasets, including the Common Crawl dataset and the English-language Wikipedia. The model was evaluated against several NLP benchmarks, matching state-of-the-art performance on “closed-book” question-answering tasks and setting a new record for the LAMBADA language modeling task.
The datasets it was trained on include Wikipedia (English), Common Web Crawl (basically a subset of the Internet), Github, among others.
A team of researchers from OpenAI recently published a paper describing GPT-3, a deep-learning model for natural-language with 175 billion parameters, 100x more than the previous version, GPT-2. The model is pre-trained on nearly half a trillion words and achieves state-of-the-art performance on several NLP benchmarks without fine-tuning.
In paper published on arXiv, a team of over 30 co-authors described the model and several experiments. The researchers’ goal was to produce an NLP system that performs well on a variety of tasks with little or no fine-tuning, and previous work had indicated that larger models might be the solution. To test that hypothesis, the team increased the size of their previous model, GPT-2, from 1.5 billion parameters to 175 billion. For training, the team collected several datasets, including the Common Crawl dataset and the English-language Wikipedia. The model was evaluated against several NLP benchmarks, matching state-of-the-art performance on “closed-book” question-answering tasks and setting a new record for the LAMBADA language modeling task.