I don’t regard bag-of-words as sufficient to show it understood. I mean, would you say that if GPT-3 responded “61” to the question “10+6=”, it understands arithmetic correctly? It mentions both the right digits, after all!
I might be a little more lenient if it had occasionally gotten some of the others right (perhaps despite my sampling settings it was still a bad setting - ‘sampling can show the presence of knowledge but not the absence’) or at least come close like it does on very hard arithmetic problems when you format them correctly, but given how badly it performs on all of the other puns, in both generating and explaining them, it’s clear which direction I should regress to the mean my estimate of the quality of that explanation...
I don’t regard bag-of-words as sufficient to show it understood. I mean, would you say that if GPT-3 responded “61” to the question “10+6=”, it understands arithmetic correctly? It mentions both the right digits, after all!
I might be a little more lenient if it had occasionally gotten some of the others right (perhaps despite my sampling settings it was still a bad setting - ‘sampling can show the presence of knowledge but not the absence’) or at least come close like it does on very hard arithmetic problems when you format them correctly, but given how badly it performs on all of the other puns, in both generating and explaining them, it’s clear which direction I should regress to the mean my estimate of the quality of that explanation...