We might be interpreting “modest logic-related stuff” differently—I am thinking about simple formal problems like sorting a short list of integers.
I wouldn’t be surprised if GPT-2 (or its smaller version) are very capable at completing strings like “[1,2,” in a way that is merely syntactically correct. Publicly available texts on the internet probably contain a lot of comma-separated number lists in brackets. The challenge is for the model to have the ability to sort numbers (when trained only to predict the next word in internet texts).
However, after thinking about it more I am now less confident that GPT-2 would fail to complete my above sentence with a correctly sorted list, because for any two small integers like 2 and 3 it is plausible that the training data contains more “2,3” strings than “3,2″ strings.
Consider instead the following problem:
“The median number in the list [9,2,1,6,8] is ”
I’m pretty sure that GPT-2 would fail at least 1⁄5 of the times to complete such a sentence (i.e. if we query it multiple times and each time the sentence contains small random integers).
GPT-2 works by deterministically fetching the probability distribution over the next token, then sampling from it. It is plausible that the probability it assigns to 6 is no larger than 80%, but it’s simple enough to postprocess every probability larger than 50% to 100%. (This isn’t always done because when completing a list prefix of size 4, it would always produce an infinite list, because the probability of another , is more than 50%.)
We might be interpreting “modest logic-related stuff” differently—I am thinking about simple formal problems like sorting a short list of integers.
I wouldn’t be surprised if GPT-2 (or its smaller version) are very capable at completing strings like “[1,2,” in a way that is merely syntactically correct. Publicly available texts on the internet probably contain a lot of comma-separated number lists in brackets. The challenge is for the model to have the ability to sort numbers (when trained only to predict the next word in internet texts).
However, after thinking about it more I am now less confident that GPT-2 would fail to complete my above sentence with a correctly sorted list, because for any two small integers like 2 and 3 it is plausible that the training data contains more “2,3” strings than “3,2″ strings.
Consider instead the following problem:
“The median number in the list [9,2,1,6,8] is ”
I’m pretty sure that GPT-2 would fail at least 1⁄5 of the times to complete such a sentence (i.e. if we query it multiple times and each time the sentence contains small random integers).
GPT-2 works by deterministically fetching the probability distribution over the next token, then sampling from it. It is plausible that the probability it assigns to 6 is no larger than 80%, but it’s simple enough to postprocess every probability larger than 50% to 100%. (This isn’t always done because when completing a list prefix of size 4, it would always produce an infinite list, because the probability of another , is more than 50%.)