mouse_mouse comments on [missing post]

mouse_mouse Mar 1, 2024, 5:58 AM
4 points
0
Glad I could help. If you want to learn more about LLMs and have enough interest in the topic, I recommend getting hands-on with one that is “raw.” You almost certainly can’t run anything nearly as big as ChatGPT on your home computer, but there are models available on huggingface which will run on home computers.
I found that playing around with LLMs, especially of the size that is runnable on my PC, really helped illuminate their capabilities and deficits for me. When they’re ~7B parameters in size they’re somewhat reliable if you prompt them correctly, but also extremely fragile depending on the prompt, and I think this sort of relates to your point here:
You also pointed out that I was underestimating the quality of ChatGPT’s knowledge, its ability to separate a sentence into components, and its ability to recognize sentiment—and that all these underestimations caused me to underestimate the network’s confidence (or deduced probability) that the toddler would be the predator—also makes sense. At first, it seems like ChatGPT’s ability to “recognize sentiment” based on sentence structure (as you explained, “I can’t X-action, because otherwise Y-consequence will happen”) would be cognition in its own right, since no programmer wrote direct code to recognize sentiment in that way for ChatGPT (as far as I know).
However, after a momentary reflection on my part, I think you would probably answer that any time you say ChatGPT “recognizes” or even “thinks” something, you’re just using shorthand for saying ChatGPT’s probabilistic calculations would result in sentences that would appear similar to what a human would produce after recognizing and thinking something.
I don’t think I would put it quite like that. Rather, there is some genuine thinking/calculating going on (those two terms are essentially indistinguishable if you are a functionalist, which I lean towards most of the time) that cannot be dismissed as simple probability fiddling, even if on the most granular level that is what it is.
The thing is that intelligence, or thinking, or cognition, or whatever you want to use to describe the thing LLMs might be doing, is very hard to spot up close. If you talk to a human and observe their behavior they seem intelligent enough, but when you actually peer inside their skull the intelligence evaporates, replaced by a glob of mechanistic pieces that turn on and off and move around. And the closer you look the worse it gets, until you’re looking at a single neuron firing. It is hard to see how a persons essence and interior experience is made out of just a big tangle of those simple little pieces.
I think when first examining any impressive neural network it’s natural to have the same sort of reaction: you feel bamboozled because once you get up close, what it does does not look like intelligence or cognition, it looks like math. And fairly un-magical math, at that. How can calculus have an inner world? It doesn’t seem to make sense.
And I stand by the statement that nothing ChatGPT does truly requires any understanding or internal modeling, because in principle I see no reason why it wouldn’t be possible to create a network that is capable of what ChatGPT does without invoking any thinking beyond the probabilistic arrangement of tokens. But I do not think that is a good enough reason to discredit the idea, especially after things like this have proven that models can and will create world models of a sort to solve training problems. And I should have mentioned that in my previous reply.
Personally, I suspect that ChatGPT has many such mini-world models within it, but I do not believe there is a strong connection between those models that creates a general understanding of all domains. And I also suspect that this is the main difference between bigger and smaller models: both big and small models have a syntactical understanding of English and relatively good adherence to the rules you set out for them. This is what I imagine as an LLMs “lizard brain.” Absent any other overriding principles, it will default to “what words fit best here based on the other words.” But large networks have the benefit of higher-order models of specific text domains and topics, which I imagine as the “monkey brain” of the network.
For example, ChatGPT-4 can play chess kinda okay, whereas ChatGPT-3.5 is total crap at it. I believe this is because 4 has a robust model for how chess works in a general sense, whereas 3.5 is relying purely on what its seen in chess notation before.
For an even broader example, ChatGPT is fairly excellent at logical reasoning. OpenLLaMa-3b is really, stupendously, extremely bad at it. I believe, but cannot confirm, that the reason for this is that OpenLLaMa did not form a general model for logical reasoning during training, but ChatGPT did. What that model looks like, how it works, how much space it takes up in the network, I have no idea. But I believe there is a high probability it is actually “thinking” about reasoning problems when confronted with them.