Thus, models exhibit a basic failure of logical deduction and do not generalize a prevalent pattern in their training set (i.e., if “A is B” occurs, “B is A” is more likely to occur).
How is this “a basic failure of logical deduction”? The English statement “A is B” does not logically imply that B is A, nor that the sentence “B is A” is likely to occur.
“the apple is red” =!> “red is the apple”
“Ben is swimming” =!> “swimming is Ben”
Equivalence is one of several relationships that can be conveyed by the English word “is”, and I’d estimate it’s not even the most common one.
One could argue that if you’re not sure which meaning of “is” is being used, then the sentence “A is B” is at least Bayesian evidence that the sentence “B is A” is valid, and therefore perhaps should update us towards thinking “B is A” even if it’s not proof. But the absence of the sentence “B is A” in the training data is also Bayesian evidence—in the opposite direction. What makes you think that this conflicting evidence balances out in favor of “B is A”? And even if it does, shouldn’t that be considered a subtle and complex calculation, rather than “a basic logical deduction”?
ETA: I’m reminded of a story I once heard about some researchers who asked a computer to parse the syntax of the phrase “time flies like an arrow”. They thought this example had a unique correct answer, but the computer proved them wrong by returning several syntactically-valid parsings, showing that the meaning of the phrase only seems obvious to humans because of their priors, and not because the statement actually lacks ambiguity.
At the time of my original comment, I had not looked at it.
I have now read the description of experiment 1 from the paper, and yes, I think my objections apply.
My best guess at the point you were trying to make by pointing me to this experiment is that you included some bidirectional examples in your test set, and therefore maybe the LLM should be able to figure out that your test set (in particular) is describing a symmetric relation, even if similar words in the LLM’s original training data were used to described asymmetric relations. Is that your implied argument?
Perhaps it would be helpful to explain my model a bit more.
(1) I think that if you show statements like “Olaf Scholz was the ninth Chancellor of Germany” or “Uriah Hawthorne is the composer of Abyssal Melodies” to typical humans, then the humans are very likely to consider the reversed statements equally valid, and the humans are very likely to be correct.
(2) Thus I conclude that it would be desirable for an LLM to make similar reversals, and that a sufficiently-good LLM would very likely succeed at this. If current LLMs can’t do this, then I agree this is some sort of failure on their part.
(3) However, I do not think that the mechanism being used by the humans to perform such reversals is to match them to the general pattern “A is B” and then reverse that pattern to yield “B is A”, nor do I believe such a general mechanism can match the humans’ accuracy.
I think the humans are probably matching to some patterns of far greater specificity, perhaps along the lines of:
(person-name) is (monarch-title) of (group)
(person-name) is (creator-title) of (created thing)
That is, I suspect it requires knowing roughly what a Chancellor or composer is, and probably also knowing at least a little bit about how people or things are commonly named. (If someone says “mighty is the king of the elves”, and then asks “who is the king of the elves?” you probably shouldn’t answer “mighty.”)
I am skeptical that the two examples from (1) are even being matched to the same pattern as each other. I suspect humans have thousands of different patterns to cover various different special cases of what this paper treats as a single phenomenon.
(4) I hadn’t considered this specific issue prior to encountering this post, but I think if you’d asked me to guess whether LLMs could do these sorts of reversals, I’d probably have guessed they could. So in that sense I am surprised.
(5) But I predict that if LLMs could do this, it would only be by learning a lot of specific information about things like chancellors and composers. If LLMs fail at this, I don’t expect that failure has anything to do with basic logic, but rather with detailed domain knowledge.
It’s nice to think about this paper as a capability request. It would be nice to have language models seamlessly run with semantic triples from wikidata, only seen once, and learn bidirectional relations.
How is this “a basic failure of logical deduction”? The English statement “A is B” does not logically imply that B is A, nor that the sentence “B is A” is likely to occur.
“the apple is red” =!> “red is the apple”
“Ben is swimming” =!> “swimming is Ben”
Equivalence is one of several relationships that can be conveyed by the English word “is”, and I’d estimate it’s not even the most common one.
One could argue that if you’re not sure which meaning of “is” is being used, then the sentence “A is B” is at least Bayesian evidence that the sentence “B is A” is valid, and therefore perhaps should update us towards thinking “B is A” even if it’s not proof. But the absence of the sentence “B is A” in the training data is also Bayesian evidence—in the opposite direction. What makes you think that this conflicting evidence balances out in favor of “B is A”? And even if it does, shouldn’t that be considered a subtle and complex calculation, rather than “a basic logical deduction”?
ETA: I’m reminded of a story I once heard about some researchers who asked a computer to parse the syntax of the phrase “time flies like an arrow”. They thought this example had a unique correct answer, but the computer proved them wrong by returning several syntactically-valid parsings, showing that the meaning of the phrase only seems obvious to humans because of their priors, and not because the statement actually lacks ambiguity.
Did you look at the design for our Experiment 1 in the paper? Do you think your objections to apply to that design?
At the time of my original comment, I had not looked at it.
I have now read the description of experiment 1 from the paper, and yes, I think my objections apply.
My best guess at the point you were trying to make by pointing me to this experiment is that you included some bidirectional examples in your test set, and therefore maybe the LLM should be able to figure out that your test set (in particular) is describing a symmetric relation, even if similar words in the LLM’s original training data were used to described asymmetric relations. Is that your implied argument?
Perhaps it would be helpful to explain my model a bit more.
(1) I think that if you show statements like “Olaf Scholz was the ninth Chancellor of Germany” or “Uriah Hawthorne is the composer of Abyssal Melodies” to typical humans, then the humans are very likely to consider the reversed statements equally valid, and the humans are very likely to be correct.
(2) Thus I conclude that it would be desirable for an LLM to make similar reversals, and that a sufficiently-good LLM would very likely succeed at this. If current LLMs can’t do this, then I agree this is some sort of failure on their part.
(3) However, I do not think that the mechanism being used by the humans to perform such reversals is to match them to the general pattern “A is B” and then reverse that pattern to yield “B is A”, nor do I believe such a general mechanism can match the humans’ accuracy.
I think the humans are probably matching to some patterns of far greater specificity, perhaps along the lines of:
(person-name) is (monarch-title) of (group)
(person-name) is (creator-title) of (created thing)
That is, I suspect it requires knowing roughly what a Chancellor or composer is, and probably also knowing at least a little bit about how people or things are commonly named. (If someone says “mighty is the king of the elves”, and then asks “who is the king of the elves?” you probably shouldn’t answer “mighty.”)
I am skeptical that the two examples from (1) are even being matched to the same pattern as each other. I suspect humans have thousands of different patterns to cover various different special cases of what this paper treats as a single phenomenon.
(4) I hadn’t considered this specific issue prior to encountering this post, but I think if you’d asked me to guess whether LLMs could do these sorts of reversals, I’d probably have guessed they could. So in that sense I am surprised.
(5) But I predict that if LLMs could do this, it would only be by learning a lot of specific information about things like chancellors and composers. If LLMs fail at this, I don’t expect that failure has anything to do with basic logic, but rather with detailed domain knowledge.
It’s nice to think about this paper as a capability request. It would be nice to have language models seamlessly run with semantic triples from wikidata, only seen once, and learn bidirectional relations.