Sentences 1 and 4 should have higher probability than sentences 2 and 3. What they find is that GPT-2 does worse than chance on these kinds of problems. If a sentence is likely, a variation on the sentence with opposite meaning tends to have similar likelihood.
Despite all this, when generating text, GPT-2 is more likely to generate a true sentence than the opposite of a true sentence. “Polar bears are found in the Arctic” is far more likely to be generated than “Polar bears are found in the tropics,” and it is also more likely to be generated than “Polar bears are not found in the Arctic” because “not found” is a less likely construction to be used in real writing than “found.”
Hm. These sound contradictory to me?
My understanding is that a sentence’s proability of being generated is closely related to its likelihood; closely enough that if a sentence has similar likelihood as its negation, it should have similar probability of generation, and vice versa. But then the first quote says “true sentences have similar, but lower likelihood than their negations” and the second says “true sentences have higher likelihood than their negations”.
Assuming I’ve got that right, what gives?
Related question: what’s the precise ranking of sentences 1-4? The quote suggests that some aggregation of 2 and 3 is ranked higher than the same aggregation of 1 and 4; but is it 2>3>1>4, or 2>1>3>4, or what?
Hm. These sound contradictory to me?
My understanding is that a sentence’s proability of being generated is closely related to its likelihood; closely enough that if a sentence has similar likelihood as its negation, it should have similar probability of generation, and vice versa. But then the first quote says “true sentences have similar, but lower likelihood than their negations” and the second says “true sentences have higher likelihood than their negations”.
Assuming I’ve got that right, what gives?
Related question: what’s the precise ranking of sentences 1-4? The quote suggests that some aggregation of 2 and 3 is ranked higher than the same aggregation of 1 and 4; but is it 2>3>1>4, or 2>1>3>4, or what?