If a further scaled up model, say around the size of the human brain, became able to solve these sorts of questions without finetuning, would that show it to have developed a proper understanding?
I think that’s exactly a problem here: The answer consists of two parts, you have to guess a food and an explanation. For the first part I agree with Anisha: The banana is probably often answer to a question related to food. Now the explanation really only requires to describe some properties of a banana. This could again just be simple pattern matching without really understanding the problem. The fundamental problem is that for this question a model that understands and one that mostly guesses could provide the same answer, so given a correct answer we can’t really distinguish between whether the model actually understands in the way we want.
If a further scaled up model, say around the size of the human brain, became able to solve these sorts of questions without finetuning, would that show it to have developed a proper understanding?
I think that’s exactly a problem here:
The answer consists of two parts, you have to guess a food and an explanation.
For the first part I agree with Anisha: The banana is probably often answer to a question related to food.
Now the explanation really only requires to describe some properties of a banana. This could again just be simple pattern matching without really understanding the problem.
The fundamental problem is that for this question a model that understands and one that mostly guesses could provide the same answer, so given a correct answer we can’t really distinguish between whether the model actually understands in the way we want.