There are some recent papers—see discussion here - showing that there is a g factor for LLMs, and that it is more predictive than g in humans/animals.
Utilizing factor analysis on two extensive datasets—Open LLM Leaderboard with 1,232 models and General Language Understanding Evaluation (GLUE) Leaderboard with 88 models—we find compelling evidence for a unidimensional, highly stable g factor that accounts for 85% of the variance in model performance. The study also finds a moderate correlation of .48 between model size and g.
+1, and in particular the paper claims that g is about twice as strong in language models as in humans and some animals.
I’m not confident that this is good research, but the original post really seems like it had a conclusion pre-written and was searching for arguments to defend it, rather than paying any attention to what other people might actually believe.
TBH i have only glanced at the abstracts of those papers, and my linking them shouldn’t be considered an endorsement. On priors I would be somewhat surprised if something like ‘g’ didn’t exist for LLMs—it stems naturally from scaling laws after all—but you have a good point about correlations of finetuned submodels. The degree of correlation or ‘variance explained by g’ in particular doesn’t seem like a sturdy metric to boast about as it will just depend heavily on the particular set of models and evaluations used.
There are some recent papers—see discussion here - showing that there is a g factor for LLMs, and that it is more predictive than g in humans/animals.
+1, and in particular the paper claims that g is about twice as strong in language models as in humans and some animals.
I’m not confident that this is good research, but the original post really seems like it had a conclusion pre-written and was searching for arguments to defend it, rather than paying any attention to what other people might actually believe.
I feel like they should have excluded different finetunings of the same base models, as surely including them pushes up the correlations.
TBH i have only glanced at the abstracts of those papers, and my linking them shouldn’t be considered an endorsement. On priors I would be somewhat surprised if something like ‘g’ didn’t exist for LLMs—it stems naturally from scaling laws after all—but you have a good point about correlations of finetuned submodels. The degree of correlation or ‘variance explained by g’ in particular doesn’t seem like a sturdy metric to boast about as it will just depend heavily on the particular set of models and evaluations used.