I am sharing this since I think it will change your view on how much to update on this paper (I should have shared this initially). Here’s what the paper author said on X:
Clarifying two things:
Model is simple transformer for science, not a language model (or large by standards today)
The model can learn new tasks (via in-context learning), but can’t generalize to new task families
I would be thrilled if this work was important for understanding AI safety and fairness, but it is the start of a scientific direction, not ready for policy conclusions. Understanding what task families a true LLM is capable of would be fascinating and more relevant to policy!
So, with that, I said:
I hastily thought the paper was using language models, so I think it’s important to share this. A follow-up paper using a couple of ‘true’ LLMs at different model scales would be great. Is it just interpolation? How far can the models extrapolate?
In retrospect, I probably should have updated much less than I did, I thought that it was actually testing a real LLM, which makes me less confident in the paper.
Should have responded long ago, but responding now.
I am sharing this since I think it will change your view on how much to update on this paper (I should have shared this initially). Here’s what the paper author said on X:
So, with that, I said:
To which @Jozdien replied:
In retrospect, I probably should have updated much less than I did, I thought that it was actually testing a real LLM, which makes me less confident in the paper.
Should have responded long ago, but responding now.