These are multiple choice questions out of 4 and performance for model sizes ≤7.1B is ~zero, so sure, that’s pretty natural, any variation between .4B and 7B is dominated by noise. You can just graph all of the curves to check this.
Noise is so high because there is a minimum competence bar in language understanding before you can start solving complex multi-step problems inside language, and the baseline probability of guessing correctly is significant.
These are multiple choice questions out of 4 and performance for model sizes ≤7.1B is ~zero, so sure, that’s pretty natural, any variation between .4B and 7B is dominated by noise. You can just graph all of the curves to check this.
Noise is so high because there is a minimum competence bar in language understanding before you can start solving complex multi-step problems inside language, and the baseline probability of guessing correctly is significant.
See my reply to Gwern: https://www.lesswrong.com/posts/G993PFTwqqdQv4eTg/is-ai-progress-impossible-to-predict?commentId=MhnGnBvJjgJ5vi5Mb
https://www.lesswrong.com/posts/G993PFTwqqdQv4eTg/is-ai-progress-impossible-to-predict?commentId=MhnGnBvJjgJ5vi5Mb