At the beginning of the year I thought a decent model of how LLMs work was 10 years or so out. I’m now thinking it may be five years or less. What do I mean?
In the days of classical symbolic AI, researchers would use a programming language, often some variety of LISP, but not always, to implement a model of some set of linguistic structures and processes, such as those involved in story understanding and generation, or question answering. I see a similar division of conceptual labor in figuring out what’s going on inside LLMs. In this analogy I see mechanistic understanding as producing the equivalent of the programming languages of classical AI. These are the structures and mechanisms of the virtual machine that operates the domain model, where the domain is language in the broadest sense. I’ve been working on figuring out a domain model and I’ve had unexpected progress in the last month. I’m beginning to see how such models can be constructed. Call these domain models meta-models for LLMs.
It’s those meta models that I’m thinking are five years out. What would the scope of such a meta model be? I don’t know. But I’m not thinking in terms of one meta model that accounts for everything a given LLM can do. I’m thinking of more limited meta models. I figure that various communities will begin creating models in areas that interest them.
I figure we start with some hand-crafting to work out some standards. Then we’ll go to work on automating the process of creating the model. How will that work? I don’t know. Noone’s ever done it.
My confidence in this project has just gone up. It seems that I now have a collaborator. That is, he’s familiar with my work in general and my investigations of ChatGPT in particular, we’ve had some email correspondence, and a couple of Zoom conversations. During today’s conversation we decided to collaborate on a paper on the theme of ‘demystifying LLMs.’
A word of caution. We haven’t written the paper yet, so who knows? But all the signs are good. He’s an expert on computer vision systems on the faculty of Goethe University in Frankfurt: Visvanathan Ramesh.
Yes. It’s more about the structure of language and cognition than about the mechanics of the models. The number of parameters and layers and functions assigned to layers shouldn’t change things, nor going multi-modal, either. Whatever the mechanics of the mechanics of the models, they have to deal with language as it is, and that’s not changing in any appreciable way.
At the beginning of the year I thought a decent model of how LLMs work was 10 years or so out. I’m now thinking it may be five years or less. What do I mean?
In the days of classical symbolic AI, researchers would use a programming language, often some variety of LISP, but not always, to implement a model of some set of linguistic structures and processes, such as those involved in story understanding and generation, or question answering. I see a similar division of conceptual labor in figuring out what’s going on inside LLMs. In this analogy I see mechanistic understanding as producing the equivalent of the programming languages of classical AI. These are the structures and mechanisms of the virtual machine that operates the domain model, where the domain is language in the broadest sense. I’ve been working on figuring out a domain model and I’ve had unexpected progress in the last month. I’m beginning to see how such models can be constructed. Call these domain models meta-models for LLMs.
It’s those meta models that I’m thinking are five years out. What would the scope of such a meta model be? I don’t know. But I’m not thinking in terms of one meta model that accounts for everything a given LLM can do. I’m thinking of more limited meta models. I figure that various communities will begin creating models in areas that interest them.
I figure we start with some hand-crafting to work out some standards. Then we’ll go to work on automating the process of creating the model. How will that work? I don’t know. Noone’s ever done it.
My confidence in this project has just gone up. It seems that I now have a collaborator. That is, he’s familiar with my work in general and my investigations of ChatGPT in particular, we’ve had some email correspondence, and a couple of Zoom conversations. During today’s conversation we decided to collaborate on a paper on the theme of ‘demystifying LLMs.’
A word of caution. We haven’t written the paper yet, so who knows? But all the signs are good. He’s an expert on computer vision systems on the faculty of Goethe University in Frankfurt: Visvanathan Ramesh.
These are my most important papers on ChatGPT:
ChatGPT tells stories, and a note about reverse engineering: A Working Paper
Discursive Competence in ChatGPT, Part 2: Memory for Texts
ChatGPT tells 20 versions of its prototypical story, with a short note on method
ChatGPT’s Ontological Landscape: A Working Paper
To clarify: do you think in about 5 years we will be able to do such thing to then state of the art big models?
Yes. It’s more about the structure of language and cognition than about the mechanics of the models. The number of parameters and layers and functions assigned to layers shouldn’t change things, nor going multi-modal, either. Whatever the mechanics of the mechanics of the models, they have to deal with language as it is, and that’s not changing in any appreciable way.