One stab might be some kind of “semantic sensitivity”:
Some inputs are close in terms of edit distance, but very different semantically. One clue that a system can reason is if it can correctly respond to these small variations, and explain the difference.
This is part of why I tested similar situations with the bullet—I wanted to see whether small changes to the words would provoke a substantively different response.
I think another part of this is “sequential processing steps required”—you couldn’t just look up a fact or a definition somewhere, to get the correct response.
This is still woefully incomplete, but hopefully this helps a bit.
I like the second suggestion a lot more than the first. To me, the first is getting more at “Does GPT convert to a semantic representation, or just go based off of syntax?” I already strongly suspect it does something more meaningful than “just syntax”—but whether it then reasons about it is another matter.
One stab might be some kind of “semantic sensitivity”:
This is part of why I tested similar situations with the bullet—I wanted to see whether small changes to the words would provoke a substantively different response.
I think another part of this is “sequential processing steps required”—you couldn’t just look up a fact or a definition somewhere, to get the correct response.
This is still woefully incomplete, but hopefully this helps a bit.
I like the second suggestion a lot more than the first. To me, the first is getting more at “Does GPT convert to a semantic representation, or just go based off of syntax?” I already strongly suspect it does something more meaningful than “just syntax”—but whether it then reasons about it is another matter.