Nitpick: a language model is basically just an algorithm to predict text. It doesn’t necessarily need to be a fixed architecture like ChatGPT. So for example: “get ChatGPT to write a program that outputs the next token and then run that program” is technically also a language model, and has no computational complexity limit (other than the underlying hardware).
Hmm, I’ve not seen people refer to (ChatGPT + Code execution plugin) as an LLM. IMO, an LLM is supposed to be language model consisting of just a neural network with a large number of parameters.
I think your definition of LLM is the common one. For example, https://www.lesswrong.com/posts/KJRBb43nDxk6mwLcR/ai-doom-from-an-llm-plateau-ist-perspective is on the front page right now, and it uses LLM to refer to a big neural net, in a transformer topology, trained with a lot of data. This is how I was intending to use it as well. Note the difference between “language model” as Christopher King used it, and “large language model” as I am. I plan to keep using LLM for now, especially as GPT refers to OpenAI’s product and not the general class of things.
Thanks, this is exactly the kind of feedback I was hoping for.
Nomenclature-wise: I was using LLM to mean “deep neural nets in the style of GPT-3” but I should be more precise. Do you know of a good term for what I meant?
More generally, I should learn about other styles of LLM. I’ve gotten some good leads from these comments and some DMs.
Nitpick: a language model is basically just an algorithm to predict text. It doesn’t necessarily need to be a fixed architecture like ChatGPT. So for example: “get ChatGPT to write a program that outputs the next token and then run that program” is technically also a language model, and has no computational complexity limit (other than the underlying hardware).
Hmm, I’ve not seen people refer to (ChatGPT + Code execution plugin) as an LLM. IMO, an LLM is supposed to be language model consisting of just a neural network with a large number of parameters.
I think your definition of LLM is the common one. For example, https://www.lesswrong.com/posts/KJRBb43nDxk6mwLcR/ai-doom-from-an-llm-plateau-ist-perspective is on the front page right now, and it uses LLM to refer to a big neural net, in a transformer topology, trained with a lot of data. This is how I was intending to use it as well. Note the difference between “language model” as Christopher King used it, and “large language model” as I am. I plan to keep using LLM for now, especially as GPT refers to OpenAI’s product and not the general class of things.
Thanks, this is exactly the kind of feedback I was hoping for.
Nomenclature-wise: I was using LLM to mean “deep neural nets in the style of GPT-3” but I should be more precise. Do you know of a good term for what I meant?
More generally, I should learn about other styles of LLM. I’ve gotten some good leads from these comments and some DMs.
Hmm, how about GPT (generative pre-trained transformer)?