I worry I am getting to the point where I am telling myself that I get how ChatGPT works from hearing similar explanations repeatedly to a degree where I can echo and predict them, and seeing some connection to what it does and the errors it has—when I still absolutely bloody don’t understand how this works.
I’m certain I don’t, because I still gotta say—if someone had explained this tech to me, the way it has been explained to me now, but prior to seeing it deployed, I would have declared, confidently, angrily, that it would absolutely not be able to do the things it can now demonstrably do. And I feel if I were to travel back in time to talk to that confident past me, there is not technical explanation I could provide at this point for why that assessment should change. I believe that it works and that it can do it just because I have seen it do so, not because it makes sense to me or I genuinely understand how. I’m worried I am just getting to a point where repeating similar explanations is making me get used to them and accept them, despite my understanding not warranting this, the way you can get basics of quantum mechanics or relativity hammered into your brain through repetition and metaphor, without actually having understood the math at all. It is not just that I would not be able to build a LLM myself; it is that if I had not seen them deployed, trying to do so wouldn’t even seem to me like a particularly promising idea. This assessment is clearly dead wrong. I really want to get a deeper understanding of this.
That is part of what I am struggling with when listening to explanations. That I cannot tell how much of me just not seeing how the explanations I have been given explain what I am seeing these models do is me being stupid and uneducated and unexperienced on the topic, and how much of it is those explaining to me bullshitting about their understanding. Like, I am, genuinely, uneducated on the topic. I should expect to be confused. But the type of confusion… I feel like there is something deeper and more problematic to it.
Like, it is like people confidently proposing models of consciousness, and you are like… seriously, if I had shown you a brain, and no hint of subjective experience, based on what you saw in the brain, you are telling me you would have predicted subjective experience? Because you see how it necessarily follows from what is going on here? No? Then don’t tell me you properly understand why the heck we actually got it. Like, I respect people who think we are onto something with recurrent feedback, I am one of them and have been for a long time, I do see a lot of supporting evidence, and it does tickle my intuition. But I resent it when people just go “loops! hand gesture It makes sense, you see?!? This explains everything!” without acknowledging all the leaps we are making in our understanding, and where we would go off the rails if we didn’t know what the result we want to explain looked like already, and how completely uncertain we become when aspects change.
Like, if we apply the current understanding of LLM as explaining what they do right now—does this mean we can make accurate predictions what these models will be able to do in two years, given a range of the following changes? Cause if not… we aren’t explaining, we are just cobbling together something retroactively.
Fwiw, I think the people who made gpt were surprised by its capabilities. I’ve been making smaller language models professionally for five years, and I know far more about them than the average person, and I don’t really understand how chatgpt does some of the stuff it does. Ultimately I think it has to be a fact about language being systematic rather than anything special about chatgpt itself. I.e., the problem of fluently using language is just easier than we (like to) think, not that chatgpt is magic.
There are scaling laws papers, but they just predict how low the loss will go. No one has a very good idea of what capabilities emerge at a given loss level, but we do know from past experience that pretty much fundamentally new stuff does emerge as loss goes down.
If researchers who knew their shit were surprised when these capabilities emerged, and no new discoveries on functionality happened afterwards, and no fundamental theoretical shift in understanding happened afterwards, I would indeed suspect that even those who are competent do not actually understand even now. Surprises of this caliber are an indication that new scientific theories are needed, not something that should just be explained away as somehow overlooked in existing theories despite many bright minds having a strong incentive to look for it.
Falling prey to illusions of understanding on LLM
I worry I am getting to the point where I am telling myself that I get how ChatGPT works from hearing similar explanations repeatedly to a degree where I can echo and predict them, and seeing some connection to what it does and the errors it has—when I still absolutely bloody don’t understand how this works.
I’m certain I don’t, because I still gotta say—if someone had explained this tech to me, the way it has been explained to me now, but prior to seeing it deployed, I would have declared, confidently, angrily, that it would absolutely not be able to do the things it can now demonstrably do. And I feel if I were to travel back in time to talk to that confident past me, there is not technical explanation I could provide at this point for why that assessment should change. I believe that it works and that it can do it just because I have seen it do so, not because it makes sense to me or I genuinely understand how. I’m worried I am just getting to a point where repeating similar explanations is making me get used to them and accept them, despite my understanding not warranting this, the way you can get basics of quantum mechanics or relativity hammered into your brain through repetition and metaphor, without actually having understood the math at all. It is not just that I would not be able to build a LLM myself; it is that if I had not seen them deployed, trying to do so wouldn’t even seem to me like a particularly promising idea. This assessment is clearly dead wrong. I really want to get a deeper understanding of this.
At least you have a leg up on the people who are still confidently and angrily denouncing the idea of chatgpt having any intelligence.
Part of the reason AI safety is so scary is that no one really understands how these models do what they do. (Or when we can expect them to do it.)
That is part of what I am struggling with when listening to explanations. That I cannot tell how much of me just not seeing how the explanations I have been given explain what I am seeing these models do is me being stupid and uneducated and unexperienced on the topic, and how much of it is those explaining to me bullshitting about their understanding. Like, I am, genuinely, uneducated on the topic. I should expect to be confused. But the type of confusion… I feel like there is something deeper and more problematic to it.
Like, it is like people confidently proposing models of consciousness, and you are like… seriously, if I had shown you a brain, and no hint of subjective experience, based on what you saw in the brain, you are telling me you would have predicted subjective experience? Because you see how it necessarily follows from what is going on here? No? Then don’t tell me you properly understand why the heck we actually got it. Like, I respect people who think we are onto something with recurrent feedback, I am one of them and have been for a long time, I do see a lot of supporting evidence, and it does tickle my intuition. But I resent it when people just go “loops! hand gesture It makes sense, you see?!? This explains everything!” without acknowledging all the leaps we are making in our understanding, and where we would go off the rails if we didn’t know what the result we want to explain looked like already, and how completely uncertain we become when aspects change.
Like, if we apply the current understanding of LLM as explaining what they do right now—does this mean we can make accurate predictions what these models will be able to do in two years, given a range of the following changes? Cause if not… we aren’t explaining, we are just cobbling together something retroactively.
Fwiw, I think the people who made gpt were surprised by its capabilities. I’ve been making smaller language models professionally for five years, and I know far more about them than the average person, and I don’t really understand how chatgpt does some of the stuff it does. Ultimately I think it has to be a fact about language being systematic rather than anything special about chatgpt itself. I.e., the problem of fluently using language is just easier than we (like to) think, not that chatgpt is magic.
There are scaling laws papers, but they just predict how low the loss will go. No one has a very good idea of what capabilities emerge at a given loss level, but we do know from past experience that pretty much fundamentally new stuff does emerge as loss goes down.
See here for scaling laws stuff: https://www.lesswrong.com/tag/scaling-laws
Thank you for sharing that.
If researchers who knew their shit were surprised when these capabilities emerged, and no new discoveries on functionality happened afterwards, and no fundamental theoretical shift in understanding happened afterwards, I would indeed suspect that even those who are competent do not actually understand even now. Surprises of this caliber are an indication that new scientific theories are needed, not something that should just be explained away as somehow overlooked in existing theories despite many bright minds having a strong incentive to look for it.