I live my life under the assumption that it is correct, and I do not make allowances in my strategic thinking that it may be false. As for how hard it would be to convince me I was wrong, I am currently sufficiently invested in the atomic theory of matter that I can’t think, off-hand, what such evidence would look like. But I presume (hope) that a well-stated falsifiable experiment which showed matter as a continuum would convince me to become curious.
One viewpoint I’ve learned from the skeptical community is that individual experiments have very little value—an experiment with a stated p-value of 0.05 actually has more than a 1-in-20 chance of being wrong. Collections of experiments, however, from a whole field of research, that can provide some valuable evidence; for example, 400 different experiments, of which around 370 lean in one direction and 30 in the other, and there’s a noticeable trend that the tighter the experiment, the more likely it is to lean in the majority direction.
What I’m currently trying to wrestle with is that if there are 400 experiments, then even restating their p-values in terms of logarithmic decibans, you can’t /just/ add all that evidence up. At the least, there seems to be a ceiling, based on the a-few-in-a-billion odds of extreme mental disorder. I’m currently wondering if a second-order derivative for evidence might be in order—eg, take decibans as a linear measure and work with a logarithm based on that. Or, perhaps, some other transformation which further reduces the impact of evidence when there’s already a lot of it.
What I’m currently trying to wrestle with is that if there are 400 experiments, then even restating their p-values in terms of logarithmic decibans, you can’t /just/ add all that evidence up. At the least, there seems to be a ceiling, based on the a-few-in-a-billion odds of extreme mental disorder.
A larger obstacle to adding them up is that 400 experiments are never going to be independent. There will be systematic errors. Ten experiments in ten independent laboratories by ten independent teams, all using the same, unwittingly flawed, method of measuring something, will just give a more precise measurement of a wrong value.
How do people conducting meta-analyses deal with this problem? A few Google searches showed the problem being acknowledged, but not what to do about it.
How do people conducting meta-analyses deal with this problem? A few Google searches showed the problem being acknowledged, but not what to do about it.
I doubt there’s a general solution open to the meta-analyst, since estimating systematic error requires domain-specific knowledge.
I doubt there’s a general solution open to the meta-analyst, since estimating systematic error requires domain-specific knowledge.
I would expect meta-analysis to require such knowledge anyway, but I don’t know if this is what happens in practice. Are meta-analyses customarily done other than by experts in the field?
I would expect meta-analysis to require such knowledge anyway,
Ideally a meta-analyst would have domain-specific knowledge, but the process of doing a basic meta-analysis is standardized enough that one can carry it out without such knowledge. One just needs to systematically locate studies, extract effect sizes from them, and find various weighted averages of those effect sizes.
Are meta-analyses customarily done other than by experts in the field?
Good point. Most meta-analyses are done by people in the field, although I’m not sure whether they’re typically experts in the specific phenomenon they’re meta-analyzing.
Thinking about it, maybe the problem’s a simple one: estimating systematic errors is really hard. I’ve seen it done occasionally for experimental physics papers, where authors can plausibly argue they’ve managed to pinpoint all possible sources of systematic error and account for them. But an epidemiologist meta-analyzing observational studies generally can’t quantify confounding biases and analogous sources of systematic error.
Most meta-analyses are done by people in the field, although I’m not sure whether they’re typically experts in the specific phenomenon they’re meta-analyzing.
My own impression has been this as well: if you already understand your basic null-hypothesis testing, a regular meta-analysis isn’t that hard to learn how to do.
But an epidemiologist meta-analyzing observational studies generally can’t quantify confounding biases and analogous sources of systematic error.
Do you have any materials on epidemiological meta-analyses? I’ve been thinking of trying to meta-analyze the correlations of lithium in drinking water, but even after a few days of looking through papers and textbooks I still haven’t found any good resources on how to handle the problems in epidemiology or population-level correlations.
Do you have any materials on epidemiological meta-analyses? [...] I still haven’t found any good resources on how to handle the problems in epidemiology or population-level correlations.
Not to hand. But (as you’ve found) I doubt they’d tell you what you want to know, anyway. The problems aren’t special epidemiological phenomena but generic problems of causal inference. They just bite harder in epidemiology because (1) background theory isn’t as good at pinpointing relevant causal factors and (2) controlled experiments are harder to do in epidemiology.
If I were in your situation, I’d probably try running a sensitivity analysis. Specifically, I’d think of plausible ways confounding would’ve occurred, guesstimate a probability distribution for each possible form of confounding, then do Monte Carlo simulations using those probability distributions to estimate the probability distribution of the systematic error from confounding. This isn’t usually that satisfactory, since it’s a lot of work and the result often depends on arsepulls.
But it’s hard to do better. There are philosophers of causality out there (like this guy) who work on rigorous methods for inferring causes from observational data, but as far as I know those methods require pretty strong & fiddly assumptions. (IlyaShpitser can probably go into more detail about these methods.) They also can’t do things like magically turn a population-level correlation into an individual-level correlation, so I’d guess you’re SOL there.
But (as you’ve found) I doubt they’d tell you what you want to know, anyway. The problems aren’t special epidemiological phenomena but generic problems of causal inference. They just bite harder in epidemiology because (1) background theory isn’t as good at pinpointing relevant causal factors
I’ve found that there’s always a lot of field-specific tricks; it’s one of those things I really was hoping to find.
This isn’t usually that satisfactory, since it’s a lot of work and the result often depends on arsepulls.
Yeah, that’s not worth bothering with.
(2) controlled experiments are harder to do in epidemiology.
The really frustrating thing about the lithium-in-drinking-water correlation is that it would be very easy to do a controlled experiment. Dump some lithium into some randomly chosen county’s water treatment plants to bring it up to the high end of ‘safe’ natural variation, come back a year later and ask the government for suicide & crime rates, see if they fell; repeat n times; and you’re done.
They also can’t do things like magically turn a population-level correlation into an individual-level correlation, so I’d guess you’re SOL there.
I’m interested for generic utilitarian reasons, so I’d be fine with a population-level correlation.
They just bite harder in epidemiology because (1) background theory isn’t as good at pinpointing relevant causal factors
I’ve found that there’s always a lot of field-specific tricks; it’s one of those things I really was hoping to find.
Hmm. Based on the epidemiology papers I’ve skimmed through over the years, there don’t seem to be any killer tricks. The usual procedure for non-experimental papers seems to be to pick a few variables out of thin air that sound like they might be confounders, measure them, and then toss them into a regression alongside the variables one actually cares about. (Sometimes matching is used instead of regression but the idea is similar.)
Still, it’s quite possible I’m only drawing a blank because I’m not an epidemiologist and I haven’t picked up enough tacit knowledge of useful analysis tricks. Flicking through papers doesn’t actually make me an expert.
The really frustrating thing about the lithium-in-drinking-water correlation is that it would be very easy to do a controlled experiment.
True. Even though doing experiments is harder in general in epidemiology, that’s a poor excuse for not doing the easy experiments.
I’m interested for generic utilitarian reasons, so I’d be fine with a population-level correlation.
Ah, I see. I misunderstood your earlier comment as being a complaint about population-level correlations.
I’m not sure which variables you’re looking for (population-level) correlations among, but my usual procedure for finding correlations is mashing keywords into Google Scholar until I find papers with estimates of the correlations I want. (For this comment, I searched for “smoking IQ conscientiousness correlation” without the quotes, to give an example.) Then I just reuse those numbers for whatever analysis I’d like to do.
This is risky because two variables can correlate differently in different populations. To reduce that risk I try to use the estimate from the population most similar to the population I have in mind, or I try estimating the correlation myself in a public use dataset that happens to include both variables and the population I want.
(For this comment, I searched for “smoking IQ conscientiousness correlation” without the quotes, to give an example.) Then I just reuse those numbers for whatever analysis I’d like to do. This is risky because two variables can correlate differently in different populations. To reduce that risk I try to use the estimate from the population most similar to the population I have in mind, or I try estimating the correlation myself in a public use dataset that happens to include both variables and the population I want.
You never try to meta-analyze them with perhaps a state or country moderator?
You never try to meta-analyze them with perhaps a state or country moderator?
I misunderstood you again; for some reason I got it into my head that you were asking about getting a point estimate of a secondary correlation that enters (as a nuisance parameter) into a meta-analysis of some primary quantity.
Yeah, if I were interested in a population-level correlation in its own right I might of course try meta-analyzing it with moderators like state or country.
I live my life under the assumption that it is correct, and I do not make allowances in my strategic thinking that it may be false. As for how hard it would be to convince me I was wrong, I am currently sufficiently invested in the atomic theory of matter that I can’t think, off-hand, what such evidence would look like. But I presume (hope) that a well-stated falsifiable experiment which showed matter as a continuum would convince me to become curious.
One viewpoint I’ve learned from the skeptical community is that individual experiments have very little value—an experiment with a stated p-value of 0.05 actually has more than a 1-in-20 chance of being wrong. Collections of experiments, however, from a whole field of research, that can provide some valuable evidence; for example, 400 different experiments, of which around 370 lean in one direction and 30 in the other, and there’s a noticeable trend that the tighter the experiment, the more likely it is to lean in the majority direction.
What I’m currently trying to wrestle with is that if there are 400 experiments, then even restating their p-values in terms of logarithmic decibans, you can’t /just/ add all that evidence up. At the least, there seems to be a ceiling, based on the a-few-in-a-billion odds of extreme mental disorder. I’m currently wondering if a second-order derivative for evidence might be in order—eg, take decibans as a linear measure and work with a logarithm based on that. Or, perhaps, some other transformation which further reduces the impact of evidence when there’s already a lot of it.
A larger obstacle to adding them up is that 400 experiments are never going to be independent. There will be systematic errors. Ten experiments in ten independent laboratories by ten independent teams, all using the same, unwittingly flawed, method of measuring something, will just give a more precise measurement of a wrong value.
How do people conducting meta-analyses deal with this problem? A few Google searches showed the problem being acknowledged, but not what to do about it.
I doubt there’s a general solution open to the meta-analyst, since estimating systematic error requires domain-specific knowledge.
I would expect meta-analysis to require such knowledge anyway, but I don’t know if this is what happens in practice. Are meta-analyses customarily done other than by experts in the field?
Ideally a meta-analyst would have domain-specific knowledge, but the process of doing a basic meta-analysis is standardized enough that one can carry it out without such knowledge. One just needs to systematically locate studies, extract effect sizes from them, and find various weighted averages of those effect sizes.
Good point. Most meta-analyses are done by people in the field, although I’m not sure whether they’re typically experts in the specific phenomenon they’re meta-analyzing.
Thinking about it, maybe the problem’s a simple one: estimating systematic errors is really hard. I’ve seen it done occasionally for experimental physics papers, where authors can plausibly argue they’ve managed to pinpoint all possible sources of systematic error and account for them. But an epidemiologist meta-analyzing observational studies generally can’t quantify confounding biases and analogous sources of systematic error.
My own impression has been this as well: if you already understand your basic null-hypothesis testing, a regular meta-analysis isn’t that hard to learn how to do.
Do you have any materials on epidemiological meta-analyses? I’ve been thinking of trying to meta-analyze the correlations of lithium in drinking water, but even after a few days of looking through papers and textbooks I still haven’t found any good resources on how to handle the problems in epidemiology or population-level correlations.
Not to hand. But (as you’ve found) I doubt they’d tell you what you want to know, anyway. The problems aren’t special epidemiological phenomena but generic problems of causal inference. They just bite harder in epidemiology because (1) background theory isn’t as good at pinpointing relevant causal factors and (2) controlled experiments are harder to do in epidemiology.
If I were in your situation, I’d probably try running a sensitivity analysis. Specifically, I’d think of plausible ways confounding would’ve occurred, guesstimate a probability distribution for each possible form of confounding, then do Monte Carlo simulations using those probability distributions to estimate the probability distribution of the systematic error from confounding. This isn’t usually that satisfactory, since it’s a lot of work and the result often depends on arsepulls.
But it’s hard to do better. There are philosophers of causality out there (like this guy) who work on rigorous methods for inferring causes from observational data, but as far as I know those methods require pretty strong & fiddly assumptions. (IlyaShpitser can probably go into more detail about these methods.) They also can’t do things like magically turn a population-level correlation into an individual-level correlation, so I’d guess you’re SOL there.
I’ve found that there’s always a lot of field-specific tricks; it’s one of those things I really was hoping to find.
Yeah, that’s not worth bothering with.
The really frustrating thing about the lithium-in-drinking-water correlation is that it would be very easy to do a controlled experiment. Dump some lithium into some randomly chosen county’s water treatment plants to bring it up to the high end of ‘safe’ natural variation, come back a year later and ask the government for suicide & crime rates, see if they fell; repeat n times; and you’re done.
I’m interested for generic utilitarian reasons, so I’d be fine with a population-level correlation.
Hmm. Based on the epidemiology papers I’ve skimmed through over the years, there don’t seem to be any killer tricks. The usual procedure for non-experimental papers seems to be to pick a few variables out of thin air that sound like they might be confounders, measure them, and then toss them into a regression alongside the variables one actually cares about. (Sometimes matching is used instead of regression but the idea is similar.)
Still, it’s quite possible I’m only drawing a blank because I’m not an epidemiologist and I haven’t picked up enough tacit knowledge of useful analysis tricks. Flicking through papers doesn’t actually make me an expert.
True. Even though doing experiments is harder in general in epidemiology, that’s a poor excuse for not doing the easy experiments.
Ah, I see. I misunderstood your earlier comment as being a complaint about population-level correlations.
I’m not sure which variables you’re looking for (population-level) correlations among, but my usual procedure for finding correlations is mashing keywords into Google Scholar until I find papers with estimates of the correlations I want. (For this comment, I searched for “smoking IQ conscientiousness correlation” without the quotes, to give an example.) Then I just reuse those numbers for whatever analysis I’d like to do.
This is risky because two variables can correlate differently in different populations. To reduce that risk I try to use the estimate from the population most similar to the population I have in mind, or I try estimating the correlation myself in a public use dataset that happens to include both variables and the population I want.
You never try to meta-analyze them with perhaps a state or country moderator?
I misunderstood you again; for some reason I got it into my head that you were asking about getting a point estimate of a secondary correlation that enters (as a nuisance parameter) into a meta-analysis of some primary quantity.
Yeah, if I were interested in a population-level correlation in its own right I might of course try meta-analyzing it with moderators like state or country.