Yeah, I think there’s a reasonable case to be made that fooling GPT by including one off-topic sentence in an otherwise common kind of text is actually “not fooling it” in a sense—on the training distribution, maybe when a common kind of text (reviews, recipe intros, corporate boilerplate, news stories, code, etc.) contains one off-topic sentence, that sentence really doesn’t mean anything important about the rest of the text.
We may interpret it differently because we’re humans who know that the deployment distribution is “text people input into GPT”, where single sentences seem more important, not “an actual random sample from the internet.”
But I suspect that this is a reasonable heuristic that could be pushed to produce unreasonable results.
Going a little further, I’m actually not sure that “fooling” GPT-3 is quite the best framing. GPT-3 isn’t playing a game where it’s trying to guess the scenario based on trustworthy textual cues and then describing the rest of it. That’s a goal we’re imposing upon it.
We might instead say that we were attempting to generate a GPT-3 “Yelp complaints about bees in a restaurant” based on a minimal cue, and did not succeed in doing so.
Yeah, I think there’s a reasonable case to be made that fooling GPT by including one off-topic sentence in an otherwise common kind of text is actually “not fooling it” in a sense—on the training distribution, maybe when a common kind of text (reviews, recipe intros, corporate boilerplate, news stories, code, etc.) contains one off-topic sentence, that sentence really doesn’t mean anything important about the rest of the text.
We may interpret it differently because we’re humans who know that the deployment distribution is “text people input into GPT”, where single sentences seem more important, not “an actual random sample from the internet.”
But I suspect that this is a reasonable heuristic that could be pushed to produce unreasonable results.
Going a little further, I’m actually not sure that “fooling” GPT-3 is quite the best framing. GPT-3 isn’t playing a game where it’s trying to guess the scenario based on trustworthy textual cues and then describing the rest of it. That’s a goal we’re imposing upon it.
We might instead say that we were attempting to generate a GPT-3 “Yelp complaints about bees in a restaurant” based on a minimal cue, and did not succeed in doing so.