If it did so that, it wouldn’t be mostly by luck, not as the consequence of a reliable knowledge generating process. LLMs are stochastic, that’s not a baseless smear.
As ever, “power” is a cluster of different things.
It would be absolutely the consequence of a knowledge generating process. You are stochastic too, I am stochastic, there is noise and quantum randomness and we can’t ever prove that given the exact same input we’d produce the exact same output every time, without fail. And you can make an LLM deterministic, just set its temperature to zero. We don’t do that simply because it makes their output more varied, fun and interesting, but also, it doesn’t destroy the coherence of that output altogether.
Basically even thinking that “stochastic” is a kind of insult is missing the point, but that’s what people who unironically use the term “stochastic parrot” mostly do. They’re trying to say that LLMs are blind random imitators who thus are unable of true understanding and will always be, but that’s not implied by a more rigorous definition of what they do at all. Heck, for what it matters, actual parrots probably understand a bit of what they say. I’ve seen plenty of videos and testimonies of parrots using certain words in certain non-random contexts.
It would be absolutely the consequence of a knowledge generating process.
I said “reliable”. A stochastic model is only incidentally a truth generator. Do you think it’s impossible to improve on LLM s by making the underlying engine more tuned in to truth per se?
You are stochastic too, I am stochastic,
If it’s more random than us, it’s not more powerful than us.
nd you can make an LLM deterministic, just set its temperature to zero.
Which obviously ins’t going to give you novel solutions to maths problems. That’s trading off one kind of power against another.
Basically even thinking that “stochastic” is a kind of insult is missing the point, but that’s what people who unironically use the term “stochastic parrot” mostly do. They’re trying to say that LLMs are blind random imitators who thus are unable of true understanding and will always be, but that’s not implied by a more rigorous definition of what they do at all.
But the objection can be steelmanned, eg: “If it’s more random than us, it’s not more powerful than us.”
Is it more random than us? I think you’re being too simplistic. Probabilistic computation can be compounded to reduce the uncertainty to an arbitrary amount, and in some cases I think be more powerful than purely deterministic one.
At its core the LLM is deterministic anyway. It produces logits of belief on what should be the next word. We, too, have uncertain beliefs. Then the systems is set up in a certain way to turn those beliefs into text. Again, if you want to choose always the most likely answer, just set the temperature to zero!
It has “beliefs” regarding which word should follow another, and any other belief, opinion or knowledge is an incidental outcome of that. Do you think it’s impossible to improve on LLM s by making the underlying engine more tuned in to truth per se?
No, I think it’s absolutely possible, at least theoretically—not sure what would it take to actually do it of course. But that’s my point, there exists somewhere in the space of possible LLMs a “always gives you the wisest, most truthful response” model that does exactly the same thing, predicting the next token. As long as the prediction is always that of the next token that would appear in the wisest, most truthful response!
Which is different to predicting a token on the basis of the statistical regularities in the training data. An LLM that works that way is relatively poor at reliably outputting truth, so a version of the SP argument goes through.
I think for the limit of infinite, truthful training data, with sufficient abstraction, it would not be necessarily different. We too form our beliefs from “training data” after all, we’re just highly multimodal and smart enough to know the distinction between a science textbook and a fantasy novel. An LLM doesn’t have maybe that distinction perfectly clear—though it does grasp it to some point.
I just don’t really understand in what way “token prediction” is anything less than “literally any possible function from a domain of all possible observations to a domain of all possible actions”. At least if your “tokens” cover extensively enough all the space of possible things you might want to do or say.
I think a significant part of the problem is not the LLMs trouble of distinguishing truth from fiction, it’s rather to convince it through your prompt that the output you want is the former and not the latter.
If it did so that, it wouldn’t be mostly by luck, not as the consequence of a reliable knowledge generating process. LLMs are stochastic, that’s not a baseless smear.
As ever, “power” is a cluster of different things.
It would be absolutely the consequence of a knowledge generating process. You are stochastic too, I am stochastic, there is noise and quantum randomness and we can’t ever prove that given the exact same input we’d produce the exact same output every time, without fail. And you can make an LLM deterministic, just set its temperature to zero. We don’t do that simply because it makes their output more varied, fun and interesting, but also, it doesn’t destroy the coherence of that output altogether.
Basically even thinking that “stochastic” is a kind of insult is missing the point, but that’s what people who unironically use the term “stochastic parrot” mostly do. They’re trying to say that LLMs are blind random imitators who thus are unable of true understanding and will always be, but that’s not implied by a more rigorous definition of what they do at all. Heck, for what it matters, actual parrots probably understand a bit of what they say. I’ve seen plenty of videos and testimonies of parrots using certain words in certain non-random contexts.
I said “reliable”. A stochastic model is only incidentally a truth generator. Do you think it’s impossible to improve on LLM s by making the underlying engine more tuned in to truth per se?
If it’s more random than us, it’s not more powerful than us.
Which obviously ins’t going to give you novel solutions to maths problems. That’s trading off one kind of power against another.
But the objection can be steelmanned, eg: “If it’s more random than us, it’s not more powerful than us.”
Is it more random than us? I think you’re being too simplistic. Probabilistic computation can be compounded to reduce the uncertainty to an arbitrary amount, and in some cases I think be more powerful than purely deterministic one.
At its core the LLM is deterministic anyway. It produces logits of belief on what should be the next word. We, too, have uncertain beliefs. Then the systems is set up in a certain way to turn those beliefs into text. Again, if you want to choose always the most likely answer, just set the temperature to zero!
It has “beliefs” regarding which word should follow another, and any other belief, opinion or knowledge is an incidental outcome of that. Do you think it’s impossible to improve on LLM s by making the underlying engine more tuned in to truth per se?
No, I think it’s absolutely possible, at least theoretically—not sure what would it take to actually do it of course. But that’s my point, there exists somewhere in the space of possible LLMs a “always gives you the wisest, most truthful response” model that does exactly the same thing, predicting the next token. As long as the prediction is always that of the next token that would appear in the wisest, most truthful response!
Which is different to predicting a token on the basis of the statistical regularities in the training data. An LLM that works that way is relatively poor at reliably outputting truth, so a version of the SP argument goes through.
I think for the limit of infinite, truthful training data, with sufficient abstraction, it would not be necessarily different. We too form our beliefs from “training data” after all, we’re just highly multimodal and smart enough to know the distinction between a science textbook and a fantasy novel. An LLM doesn’t have maybe that distinction perfectly clear—though it does grasp it to some point.
There’s no evidence that we do so based solely on token prediction, so that’s irrelevant.
I just don’t really understand in what way “token prediction” is anything less than “literally any possible function from a domain of all possible observations to a domain of all possible actions”. At least if your “tokens” cover extensively enough all the space of possible things you might want to do or say.
I think a significant part of the problem is not the LLMs trouble of distinguishing truth from fiction, it’s rather to convince it through your prompt that the output you want is the former and not the latter.