Thinking about this a bit, (not a huge amount,) I think the specific example “are bugs real” ends up looking interesting in part because the word “bugs” in the prompt has incredibly low likelihood. (As does the following word, “real”)
So the model is conditioning on very low likelihood inputs, which seems like part of the reason for the behavior.
The prompt “Are birds real?” is somewhat more likely, given the “Birds aren’t real” conspiracy theory, but still can yield a similarly formatted answer to “Are bugs real?”
The answer makes a lot more sense when you ask a question like “Are monsters real?” or “Are ghosts real?” It seems that with FeedMe, text-davinci-002 has been trained to respond with a template answer about how “There is no one answer to this question”, and it has learned to misgeneralize this behavior to questions about real phenomena, such as “Are bugs real?”
Yeah, that seems correct, especially when you look at how likely similar answers for “Are people real?” are (It does much better, with a ~20% chance of starting with “Yes”—but there’s a lot of weight on stupid nuance and hedging.)
Interestingly, however, “bananas,” “mammals,” and “cows” are unamibiguously real.
Thinking about this a bit, (not a huge amount,) I think the specific example “are bugs real” ends up looking interesting in part because the word “bugs” in the prompt has incredibly low likelihood. (As does the following word, “real”)
So the model is conditioning on very low likelihood inputs, which seems like part of the reason for the behavior.
The prompt “Are birds real?” is somewhat more likely, given the “Birds aren’t real” conspiracy theory, but still can yield a similarly formatted answer to “Are bugs real?”
The answer makes a lot more sense when you ask a question like “Are monsters real?” or “Are ghosts real?” It seems that with FeedMe, text-davinci-002 has been trained to respond with a template answer about how “There is no one answer to this question”, and it has learned to misgeneralize this behavior to questions about real phenomena, such as “Are bugs real?”
Yeah, that seems correct, especially when you look at how likely similar answers for “Are people real?” are (It does much better, with a ~20% chance of starting with “Yes”—but there’s a lot of weight on stupid nuance and hedging.)
Interestingly, however, “bananas,” “mammals,” and “cows” are unamibiguously real.