I don’t know why you would think that would be such a barrier. You don’t need Transformers at all to do analogical reasoning, and both the CoQA and SQUAD results suggests at least some ‘modest logic-related stuff’ is going on. If you put your exact sample into the public/small GPT-2 model, it’ll even generate syntactically correct list completions and additional lists which are somewhat more sorted than not.
We might be interpreting “modest logic-related stuff” differently—I am thinking about simple formal problems like sorting a short list of integers.
I wouldn’t be surprised if GPT-2 (or its smaller version) are very capable at completing strings like “[1,2,” in a way that is merely syntactically correct. Publicly available texts on the internet probably contain a lot of comma-separated number lists in brackets. The challenge is for the model to have the ability to sort numbers (when trained only to predict the next word in internet texts).
However, after thinking about it more I am now less confident that GPT-2 would fail to complete my above sentence with a correctly sorted list, because for any two small integers like 2 and 3 it is plausible that the training data contains more “2,3” strings than “3,2″ strings.
Consider instead the following problem:
“The median number in the list [9,2,1,6,8] is ”
I’m pretty sure that GPT-2 would fail at least 1⁄5 of the times to complete such a sentence (i.e. if we query it multiple times and each time the sentence contains small random integers).
GPT-2 works by deterministically fetching the probability distribution over the next token, then sampling from it. It is plausible that the probability it assigns to 6 is no larger than 80%, but it’s simple enough to postprocess every probability larger than 50% to 100%. (This isn’t always done because when completing a list prefix of size 4, it would always produce an infinite list, because the probability of another , is more than 50%.)
I don’t know why you would think that would be such a barrier. You don’t need Transformers at all to do analogical reasoning, and both the CoQA and SQUAD results suggests at least some ‘modest logic-related stuff’ is going on. If you put your exact sample into the public/small GPT-2 model, it’ll even generate syntactically correct list completions and additional lists which are somewhat more sorted than not.
We might be interpreting “modest logic-related stuff” differently—I am thinking about simple formal problems like sorting a short list of integers.
I wouldn’t be surprised if GPT-2 (or its smaller version) are very capable at completing strings like “[1,2,” in a way that is merely syntactically correct. Publicly available texts on the internet probably contain a lot of comma-separated number lists in brackets. The challenge is for the model to have the ability to sort numbers (when trained only to predict the next word in internet texts).
However, after thinking about it more I am now less confident that GPT-2 would fail to complete my above sentence with a correctly sorted list, because for any two small integers like 2 and 3 it is plausible that the training data contains more “2,3” strings than “3,2″ strings.
Consider instead the following problem:
“The median number in the list [9,2,1,6,8] is ”
I’m pretty sure that GPT-2 would fail at least 1⁄5 of the times to complete such a sentence (i.e. if we query it multiple times and each time the sentence contains small random integers).
GPT-2 works by deterministically fetching the probability distribution over the next token, then sampling from it. It is plausible that the probability it assigns to 6 is no larger than 80%, but it’s simple enough to postprocess every probability larger than 50% to 100%. (This isn’t always done because when completing a list prefix of size 4, it would always produce an infinite list, because the probability of another , is more than 50%.)