Interesting tests and thanks for sharing. One question: using the model to answer question without context is looking to me as just checking if the learned knowledge is there to answer the question you ask to answer—that´s kind of a Knowledge machine approach—which none of the models are. And therefor comes down to just a scaling question imho—more training data relates to more knowledge relates to more questions being answered. I would be interested: did you try some other approaches like few-shot prompting in order to figure out the learned conceptual and contextual understanding?
Interesting tests and thanks for sharing. One question: using the model to answer question without context is looking to me as just checking if the learned knowledge is there to answer the question you ask to answer—that´s kind of a Knowledge machine approach—which none of the models are. And therefor comes down to just a scaling question imho—more training data relates to more knowledge relates to more questions being answered. I would be interested: did you try some other approaches like few-shot prompting in order to figure out the learned conceptual and contextual understanding?