This meta-analysis on meditation, has an interesting approach, they basically just analyze the effect sizes in the same “class” (averaging effect sizes within a study if there are multiple different outcomes measured in the same class).
So, their methodology is, as far as I can tell, described by these parts:
The aim of our meta-analysis was to assess the effect a mindfulness meditation intervention on health status measures. We considered the concept of health to include both physical and mental health. All outcome measures were either subsumed under “physical health”, “mental health” or were excluded from the analysis. We only included data from standardized and validated scales with established internal consistency (e.g., the Global Severity Inventory of Symptom Check List-R, Hospital Anxiety and Depression Scale, Beck Depression Inventory, Profile of Mood States, McGill-Melzack Pain-Rating Scale, Short Form 36 Health Survey, and Medical Symptom Checklist; a full list is available upon request). Also a conservative procedure was chosen to exclude relatively ambiguous or unconventional measures, e.g., spiritual experience, empathy, neuropsychological performance, quality of social support, and egocentrism.
“Mental health” constructs comprised scales such as psychological wellbeing and symptomatology, depression, anxiety, sleep, psychological components of quality of life, or affective perception of pain. “Physical health” constructs were medical symptoms, physical pain, physical impairment, and physical component of quality of life questionnaires.
...We first integrated all effect sizes within a single study by the calculation of means into two effect sizes, one for mental and one for physical health. If the sample size varied between scales of one study, we weighted them for N. Effect sizes obtained in this manner were aggregated across studies by the computation of a weighted mean, where the inverse of the estimated standard deviation for each investigation served as a weight [8].
So, they just split the effect sizes, and do an average of the 2 sets. Nothing more.
I dunno. They don’t give any references to papers or textbooks on meta-analysis to justify this procedure. It doesn’t sound very kosher to me.
From a statistical point of view, I wouldn’t expect this to work very well. I would expect a lot of heterogeneity and a very weak signal. However, they report very strong results with low heterogeneity (which I find pretty surprising). I don’t see any obvious way in which this would be “cheating”.
I don’t see any obvious way in which this would be “cheating”.
Oh, that’s easy: publication bias. If the original studies report only the measures which reached a cutoff, and the null is always true, then since their measures will generally all be on the same subjects/with the same n, their effect sizes will have to be fairly similar* and I’d expect the i^2 to be low even as the results are meaningless.
* since p is just a function of sample size & effect size, and the p threshold is fixed by convention at 0.05, and sample size n is pretty much the same across all measures—since why would you recruit a subject and then not get as much data as possible and omit lots of subjects? - only measurements with effect sizes big enough to cross the p with the fixed n will be reported.
While if each particular measure was done separately as a bunch of univariate or multivariate meta-analyses, they’d have to get access to the original data or they’d be able to see the publication bias on a measure by measure basis.
Or it might be that each measure has a weighted effect size of zero, it’s just that each study is biased towards a different measure, and so its ‘overall’ estimate is positive even though if we had combined each measure with all its siblings, every single one would net to zero.
Maybe I’m wrong about these speculations. But I hope you see why I feel uncomfortable with this ‘lump everything remotely similar together’ approach and would like to see what meta-analytic experts say about the approach.
This meta-analysis on meditation, has an interesting approach, they basically just analyze the effect sizes in the same “class” (averaging effect sizes within a study if there are multiple different outcomes measured in the same class).
That sounds like a completely disgusting approach… I’m going to have to read that and see if it’s a legitimate strategy.
They seem to get pretty strong effect sizes and low heterogeneity, so I’m curious to hear your thoughts on it.
So, their methodology is, as far as I can tell, described by these parts:
So, they just split the effect sizes, and do an average of the 2 sets. Nothing more.
I dunno. They don’t give any references to papers or textbooks on meta-analysis to justify this procedure. It doesn’t sound very kosher to me.
From a statistical point of view, I wouldn’t expect this to work very well. I would expect a lot of heterogeneity and a very weak signal. However, they report very strong results with low heterogeneity (which I find pretty surprising). I don’t see any obvious way in which this would be “cheating”.
Are you worried about something else specific?
Oh, that’s easy: publication bias. If the original studies report only the measures which reached a cutoff, and the null is always true, then since their measures will generally all be on the same subjects/with the same n, their effect sizes will have to be fairly similar* and I’d expect the i^2 to be low even as the results are meaningless.
* since p is just a function of sample size & effect size, and the p threshold is fixed by convention at 0.05, and sample size n is pretty much the same across all measures—since why would you recruit a subject and then not get as much data as possible and omit lots of subjects? - only measurements with effect sizes big enough to cross the p with the fixed n will be reported.
While if each particular measure was done separately as a bunch of univariate or multivariate meta-analyses, they’d have to get access to the original data or they’d be able to see the publication bias on a measure by measure basis.
Or it might be that each measure has a weighted effect size of zero, it’s just that each study is biased towards a different measure, and so its ‘overall’ estimate is positive even though if we had combined each measure with all its siblings, every single one would net to zero.
Maybe I’m wrong about these speculations. But I hope you see why I feel uncomfortable with this ‘lump everything remotely similar together’ approach and would like to see what meta-analytic experts say about the approach.
That’s a great point, I hadn’t been thinking about that. It amplifies the publication bias by a lot.