There is infinite amount of wrong answers to “What is six plus eight”, only one is correct. If GPT-3 answers it correctly in 3 or 10 tries, that means it *has* some understanding/knowledge. Through that’s moderated by numbers being very small—if it also replies with small numbers it has non-negligible chance of being correct solely by chance.
But it’s better than that.
And more complex questions, like these in the interview above are even more convincing, through the same line of reasoning. There might be (exact numbers pulled out of the air, they’re just for illustrative purposes), out of all sensible-English completions (so no “weoi123@!#*), 0.01% correct ones, 0.09% partially correct and 99% complete nonsense / off-topic etc.
Returning to arithmetic itself, for me GPT seems intent on providing off-by-one answers for some reason. Or even less wrong [heh]. When I was playing with Gwern’s prefix-confidence-rating prompt, I got this:
Q: What is half the result of the number 102?
A: [remote] 50.5
About confidence-rating prefixes, neat thing might be to experiment with “requesting” high (or low) confidence answer by making these tags part of the prompt. It worked when I tried it (for example, if it kept answering it doesn’t know the answer, I eventually tried to write question + “A: [highly likely] ”—and it answered sensibly! But I didn’t play all that much so it might’ve been a fluke.
Yeah. The way I’m thinking about it is: to discuss these questions we have to get clear on what we mean by “knowledge” in the context of GPT. In some sense Gwern is right; in a different sense, you’re right. But no one has offered a clearer definition of “knowledge” to attempt to arbitrate these questions yet (afaik, that is).
About the first paragraph:
There is infinite amount of wrong answers to “What is six plus eight”, only one is correct. If GPT-3 answers it correctly in 3 or 10 tries, that means it *has* some understanding/knowledge. Through that’s moderated by numbers being very small—if it also replies with small numbers it has non-negligible chance of being correct solely by chance.
But it’s better than that.
And more complex questions, like these in the interview above are even more convincing, through the same line of reasoning. There might be (exact numbers pulled out of the air, they’re just for illustrative purposes), out of all sensible-English completions (so no “weoi123@!#*), 0.01% correct ones, 0.09% partially correct and 99% complete nonsense / off-topic etc.
Returning to arithmetic itself, for me GPT seems intent on providing off-by-one answers for some reason. Or even less wrong [heh]. When I was playing with Gwern’s prefix-confidence-rating prompt, I got this:
Q: What is half the result of the number 102?
A: [remote] 50.5
About confidence-rating prefixes, neat thing might be to experiment with “requesting” high (or low) confidence answer by making these tags part of the prompt. It worked when I tried it (for example, if it kept answering it doesn’t know the answer, I eventually tried to write question + “A: [highly likely] ”—and it answered sensibly! But I didn’t play all that much so it might’ve been a fluke.
Here’s more if anyone’s interested.
Yeah. The way I’m thinking about it is: to discuss these questions we have to get clear on what we mean by “knowledge” in the context of GPT. In some sense Gwern is right; in a different sense, you’re right. But no one has offered a clearer definition of “knowledge” to attempt to arbitrate these questions yet (afaik, that is).