I find it amusing that GPT-4 considers meta-analyses to be worsening the results they attempt to pool together.
It’s possibly an error, but interestingly, that is also in line with more recent meta-analytic thinking: meta-analyses can be worse than individual RCTs because they simply wind up pooling the systematic errors from all the studies, yielding inflated effect sizes and overly-narrow CIs compared to the best RCTs (which may have a larger sampling error compared to the meta-analysis, but more than makes up for it in having less systematic error).
An example of this would be the Many Labs where the well-powered pre-registered Many Labs replications turned in systematically much smaller effect sizes than not just the original papers, but the meta-analyses of their subsequent literatures as well. The meta-analyses yielded smaller and better estimates than the p-hacked original paper, true, but still were far from the truth.
It’s possibly an error, but interestingly, that is also in line with more recent meta-analytic thinking: meta-analyses can be worse than individual RCTs because they simply wind up pooling the systematic errors from all the studies, yielding inflated effect sizes and overly-narrow CIs compared to the best RCTs (which may have a larger sampling error compared to the meta-analysis, but more than makes up for it in having less systematic error).
An example of this would be the Many Labs where the well-powered pre-registered Many Labs replications turned in systematically much smaller effect sizes than not just the original papers, but the meta-analyses of their subsequent literatures as well. The meta-analyses yielded smaller and better estimates than the p-hacked original paper, true, but still were far from the truth.