Suppose you were talking to a young child, who produced an incorrect answer to some question. Afterwards they’d talk to their parent who explained the right answer. The next day when you asked them the same question, they’d answer correctly.
I think it depends on the generalization. If you ask the exact same question, you cannot test if the children was able to acquire prior knowledge that would make sure they understood why the answer. Also, reformulating the answer might not suffice because we know that an algorithm like GPT-3 is already very good at this task. I think you would need to create a different question that requires a similar prior knowledge to be answered.
I think it depends on the generalization. If you ask the exact same question, you cannot test if the children was able to acquire prior knowledge that would make sure they understood why the answer. Also, reformulating the answer might not suffice because we know that an algorithm like GPT-3 is already very good at this task. I think you would need to create a different question that requires a similar prior knowledge to be answered.