I don’t really have a coherent answer to that but here it goes (before reading the spoiler):
I don’t think the model understands anything about the real world because it never experienced the real world. It doesn’t understand why “a pink flying sheep” is a language construct and not something that was observed in the real world.
Reading my answer maybe we also don’t have any understanding of the real world, we have just come up with some patterns based on the qualia (tokens) that we have experienced (been trained on). Who is to say whether those patterns match to some deeper truth or not? Maybe there is a vantage point from which our “understanding” will look like hallucinations.
I have a vague feeling that I understand the second part of your answer. Not sure though. In that model of yours are the hallucinations of ChatGPT just the result of an imperfectly trained model? And can a model be trained to ever perfectly predict text?
Thanks for the answer it gave me some serious food for thought!
You’re asking good questions! Let me see if I can help explain what other people are thinking.
It doesn’t understand why “a pink flying sheep” is a language construct and not something that was observed in the real world.
When talking about cutting edge models, you might want to be careful when making up examples like this. It’s very easy to say “LLMs can’t do X”, when in fact a state-of-the-art model like GPT 4 can actually do it quite well.
Something really weird is happening. But this is unlikely, given what we know about sheep.
The observer might be on drugs.
The observer might be talking about a work of art.
Maybe someone has disguised a drone as a pink flying sheep.
...and so on.
Now, ChatGPT 4 does not have any persistent memories, and it’s pretty bad at planning. But for this kind of simple reasoning about how the world works, it’s surprisingly reliable.
For an even more interesting demonstration of what ChatGPT can do, I was recently designing a really weird programming language. It didn’t work like any popular language. It was based on a mathematical notation for tensors with implicit summations, it had a Rust-like surface syntax, and it ran on the GPU, not the CPU. This particular combination of features is weird enough that ChatGPT can’t just parrot back what it learned from the web.
But when I have ChatGPT a half-dozen example programs in this hypothetical language, it was perfectly capable of writing brand-new programs. It could even recognize the kind of problems that this language might be good for solving, and then make a list of useful programs that couldn’t be expressed in my language. It then implemented common algorithms in the new language, in a more or less correct fashion. (It’s hard to judge “correct” in a language that doesn’t actually exist.)
I have had faculty advisors on research projects who never engaged at this level. This was probably because they couldn’t be bothered.
However, please note that I’m am not claiming that ChatGPT is “conscious” or anything like that. If I had to guess, I would say that it very likely isn’t. But that doesn’t mean that it can’t talk about the world in reasonably insightful ways, or offer mostly-cogent feedback on the design of a weird programming language. When I say “understanding”, I don’t mean it in some philosophical sense. I mean it in the sense of drawing practical onclusions about unfamiliar scenarios. Or to use a science fiction example, I wouldn’t actually care whether SkyNet experiences subjective consciousness. I would care whether it could manufacture armed robots and send them to kill me, and whether it could outsmart me at military strategy.
However, keep in mind that despite GPT 4′s strengths, it also has some very glaring weaknesses relative to ordinary humans. I think that the average squirrel has better practical problem-solving skills than GPT 4, for example. And I’m quite happy about this, because I suspect that building actual smarter-than-human AI would be about as safe as smashing lumps of plutonium together.
I think it does, thank you! In your model does a squirrel perform better than ChatGPT at practical problem solving simply because it was “trained” on practical problem solving examples and ChatGPT performs better on language tasks because it was trained on language? Or is there something fundamentally different between them?
I suspect ChatGPT 4′s weaknesses come from several sources, including:
It’s effectively amnesiac, in human terms.
If you look at the depths of the neural networks and the speed with which they respond, they have more in common with human reflexes than deliberate thought. It’s basically an actor doing a real-time improvisation exercise, not a writer mulling over each word. The fact that it’s as good as it is, well, it’s honestly terrifying to me on some level.
It has never actually lived in the physical world, or had to solve practical problems. Everything it knows comes from text or images.
Most people’s first reaction to ChatGPT is to overestimate it. Then they encounter various problems, and they switch to underestimating it. This is because we’re used to interacting with humans. But ChatGPT is very unlike a human brain. I think it’s actually better than us at some things, but much worse at other key things.
I don’t really have a coherent answer to that but here it goes (before reading the spoiler): I don’t think the model understands anything about the real world because it never experienced the real world. It doesn’t understand why “a pink flying sheep” is a language construct and not something that was observed in the real world.
Reading my answer maybe we also don’t have any understanding of the real world, we have just come up with some patterns based on the qualia (tokens) that we have experienced (been trained on). Who is to say whether those patterns match to some deeper truth or not? Maybe there is a vantage point from which our “understanding” will look like hallucinations.
I have a vague feeling that I understand the second part of your answer. Not sure though. In that model of yours are the hallucinations of ChatGPT just the result of an imperfectly trained model? And can a model be trained to ever perfectly predict text?
Thanks for the answer it gave me some serious food for thought!
You’re asking good questions! Let me see if I can help explain what other people are thinking.
When talking about cutting edge models, you might want to be careful when making up examples like this. It’s very easy to say “LLMs can’t do X”, when in fact a state-of-the-art model like GPT 4 can actually do it quite well.
For example, here’s what happens if you ask ChatGPT about “pink flying sheep”. It realizes that sheet are not supposed to be pink or to fly. So it proposes several hypotheses, including:
Something really weird is happening. But this is unlikely, given what we know about sheep.
The observer might be on drugs.
The observer might be talking about a work of art.
Maybe someone has disguised a drone as a pink flying sheep.
...and so on.
Now, ChatGPT 4 does not have any persistent memories, and it’s pretty bad at planning. But for this kind of simple reasoning about how the world works, it’s surprisingly reliable.
For an even more interesting demonstration of what ChatGPT can do, I was recently designing a really weird programming language. It didn’t work like any popular language. It was based on a mathematical notation for tensors with implicit summations, it had a Rust-like surface syntax, and it ran on the GPU, not the CPU. This particular combination of features is weird enough that ChatGPT can’t just parrot back what it learned from the web.
But when I have ChatGPT a half-dozen example programs in this hypothetical language, it was perfectly capable of writing brand-new programs. It could even recognize the kind of problems that this language might be good for solving, and then make a list of useful programs that couldn’t be expressed in my language. It then implemented common algorithms in the new language, in a more or less correct fashion. (It’s hard to judge “correct” in a language that doesn’t actually exist.)
I have had faculty advisors on research projects who never engaged at this level. This was probably because they couldn’t be bothered.
However, please note that I’m am not claiming that ChatGPT is “conscious” or anything like that. If I had to guess, I would say that it very likely isn’t. But that doesn’t mean that it can’t talk about the world in reasonably insightful ways, or offer mostly-cogent feedback on the design of a weird programming language. When I say “understanding”, I don’t mean it in some philosophical sense. I mean it in the sense of drawing practical onclusions about unfamiliar scenarios. Or to use a science fiction example, I wouldn’t actually care whether SkyNet experiences subjective consciousness. I would care whether it could manufacture armed robots and send them to kill me, and whether it could outsmart me at military strategy.
However, keep in mind that despite GPT 4′s strengths, it also has some very glaring weaknesses relative to ordinary humans. I think that the average squirrel has better practical problem-solving skills than GPT 4, for example. And I’m quite happy about this, because I suspect that building actual smarter-than-human AI would be about as safe as smashing lumps of plutonium together.
Does this help answer your question?
I think it does, thank you! In your model does a squirrel perform better than ChatGPT at practical problem solving simply because it was “trained” on practical problem solving examples and ChatGPT performs better on language tasks because it was trained on language? Or is there something fundamentally different between them?
I suspect ChatGPT 4′s weaknesses come from several sources, including:
It’s effectively amnesiac, in human terms.
If you look at the depths of the neural networks and the speed with which they respond, they have more in common with human reflexes than deliberate thought. It’s basically an actor doing a real-time improvisation exercise, not a writer mulling over each word. The fact that it’s as good as it is, well, it’s honestly terrifying to me on some level.
It has never actually lived in the physical world, or had to solve practical problems. Everything it knows comes from text or images.
Most people’s first reaction to ChatGPT is to overestimate it. Then they encounter various problems, and they switch to underestimating it. This is because we’re used to interacting with humans. But ChatGPT is very unlike a human brain. I think it’s actually better than us at some things, but much worse at other key things.
Thank you for your answers!