If I wanted to explain these results, I think I would say something like:
GPT-3 has been trained to predict what the next token would be if the prompt appeared in its dataset (text from the internet). So, if GPT-3 has learned well, it will “talk as if symbols are grounded” when it predicts that the internet-text would “talk as if symbols are grounded” following the given prompt, and not if not.
It’s hard to use this explanation to predict what GPT-3 will do on edge cases, but this would lead me to expect that GPT-3 will more often “talk as if symbols are grounded” when the prompt is a common prose format (e.g. stories, articles, forum posts), and less often when the prompt is most similar to non-symbol-groundy things in the dataset (e.g. poetry) or not that similar to anything in the dataset.
I think your examples here broadly fit that explanation, though it feels like a shaky just-so story:
If I saw the first “undead worker” prompt on a webpage, I would think “hm, normal stories don’t have this kind of weird repetition—is this a poem or a metaphor or something? A joke?” I wouldn’t think it was 97% to continue going to work, but I wouldn’t be surprised if she did—maybe 30%-50%?
The second undead worker prompt looks a lot more like a normal kind of story, so I’m not that surprised that GPT was more likely to think it continued like a story looked more symbol-groundy—if I saw that text on the internet, I would still think there was a reasonable chance that it’s some kind of joke, but not as high as the first prompt.
IDK about the map thing—this looks like a case where GPT just hadn’t seen enough training text in the general vicinity of the prompt to do very well? It’s definitely interesting that it figured out the command format, but didn’t seem to figure out the layout of the situation.
I don’t see how to test this theory, but it seems like it has to be kind of tautologically correct—predicting next token is what GPT-3 was trained to do, right?
Maybe to find out how adept GPT-3 is at continuing prompts that depend on common knowledge about common objects, or object permanence, or logical reasoning, you could create prompts that are as close as possible to what appears in the dataset, then see if it fails those prompts more than average? I don’t think there’s a lot we can conclude from unusual-looking prompts.
I’m curious what you think of this—maybe it misses the point of your post?
*(I’m not sure exactly what you mean when you say “symbol grounding”, but I’m taking it to mean something like “the words describe objects that have common-sense properties, and future words will continue this pattern”.)
If I wanted to explain these results, I think I would say something like:
It’s hard to use this explanation to predict what GPT-3 will do on edge cases, but this would lead me to expect that GPT-3 will more often “talk as if symbols are grounded” when the prompt is a common prose format (e.g. stories, articles, forum posts), and less often when the prompt is most similar to non-symbol-groundy things in the dataset (e.g. poetry) or not that similar to anything in the dataset.
I think your examples here broadly fit that explanation, though it feels like a shaky just-so story:
If I saw the first “undead worker” prompt on a webpage, I would think “hm, normal stories don’t have this kind of weird repetition—is this a poem or a metaphor or something? A joke?” I wouldn’t think it was 97% to continue going to work, but I wouldn’t be surprised if she did—maybe 30%-50%?
The second undead worker prompt looks a lot more like a normal kind of story, so I’m not that surprised that GPT was more likely to think it continued like a story looked more symbol-groundy—if I saw that text on the internet, I would still think there was a reasonable chance that it’s some kind of joke, but not as high as the first prompt.
IDK about the map thing—this looks like a case where GPT just hadn’t seen enough training text in the general vicinity of the prompt to do very well? It’s definitely interesting that it figured out the command format, but didn’t seem to figure out the layout of the situation.
I don’t see how to test this theory, but it seems like it has to be kind of tautologically correct—predicting next token is what GPT-3 was trained to do, right?
Maybe to find out how adept GPT-3 is at continuing prompts that depend on common knowledge about common objects, or object permanence, or logical reasoning, you could create prompts that are as close as possible to what appears in the dataset, then see if it fails those prompts more than average? I don’t think there’s a lot we can conclude from unusual-looking prompts.
I’m curious what you think of this—maybe it misses the point of your post?
*(I’m not sure exactly what you mean when you say “symbol grounding”, but I’m taking it to mean something like “the words describe objects that have common-sense properties, and future words will continue this pattern”.)