I’m saying “transformers” every time I am tempted to write “LLMs” because many modern LLMs also do image processing, so the term “LLM” is not quite right.
“Transformer”’s not quite right either because you can train a transformer on a narrow task. How about foundation model: “models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks”.
“Transformer”’s not quite right either because you can train a transformer on a narrow task. How about foundation model: “models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks”.