Since the blocks are randomized, there will be a mixed of confounded and accurate blocks: the apparent
effect will be weaker than the true effect. I have lost power, but not introduced bias.
I don’t think this is a good way to think about confounding. For one thing, you are implicitly assuming the effect is monotonic. Perhaps this is true with nootropics (how do you know though?) Monotonicity is not true in general, though. Maybe treatments and unwashed out partial treatments interact in weird/random ways. In general, if you are adding up unconfounded and confounded days, your sum is garbage, not a weaker version of the true sum.
I suppose that’s one way to think about u-curve responses.
A u-curve response is just one type of non-monotonic response. There could be others. I don’t think it’s entirely scientific to assume either the function is monotonic or it has a monotonic first derivative.. What if there is no simple way to describe the response?
Actually I am not even talking about the response to the treatment. Suppose you were a werewolf, and the outcome you were measuring was a physical test. Now, every few days out of 28 you would measure off the charts completely independently of whatever physical enhancement treatment you were taking, just because you were half-wolf during those days. So you might conclude there is an effect under the null. Now werewolves do not exist, but are you sure this sort of thing doesn’t happen with you? How do you know?
one I am loath to make right this month just to understand someone’s opaque objection to self-experiments
which they probably could easily make clear.
I think that’s a curious attitude for someone who is into self-experimentation (independently of whether the opaque objection can be made clear or not). In some sense, do-calculus is the math behind identifying causal effects from data. I am not sure how you can talk about these things with any confidence without reading up on the math. It’s like being a practicing consequentialist without knowing some decision theory. You can’t just rely on intuition.
I think at the very least you should write down all the assumptions you are making in order to have your conclusions be internally valid.
What is the strongest effect you ever found in this way?
I haven’t compiled my results into a table or anything but IIRC, I think the largest effect size so far was taking vitamin D at bedtime with d~=-0.7. (Roughly inline with psychology meta-analyses: effect sizes drop off sharply past |0.6|.)
I don’t think this is a good way to think about confounding. For one thing, you are implicitly assuming the effect is monotonic. Perhaps this is true with nootropics (how do you know though?)
The background research and published experiments don’t seem to include unusual adjustments for non-monotonicity (not really sure what that means in this context).
Monotonicity is not true in general, though.
In general? Do you have a meta-analysis over hundreds of different kinds of experiments showing this?
Actually I am not even talking about the response to the treatment. Suppose you were a werewolf, and the outcome you were measuring was a physical test. Now, every few days out of 28 you would measure off the charts completely independently of whatever physical enhancement treatment you were taking, just because you were half-wolf during those days. So you might conclude there is an effect under the null. Now werewolves do not exist, but are you sure this sort of thing doesn’t happen with you? How do you know?
Wouldn’t this be covered by randomization? If I randomize each day to this treatment, half of the wolf-days will be under treatment days and half under control days. They’ll inflate the standard deviation and I’ll be much less likely to reject the null.
I think that’s a curious attitude for someone who is into self-experimentation (independently of whether the opaque objection can be made clear or not).
From the sound of it, you’re largely making the theoretician’s objection: “but there are a billion ways your simple design could go wrong! How can you do any experiments if you don’t understand in detail every underlying tool or theorem?” Well, yes, it’s true that I nor other experimenters can’t rule out becoming a werewolf on every 5th Tuesday or in setting up an experiment with completely wrong blocks or washouts, nor can we be sure that induction will continue to work tomorrow and we will not be eaten by grues or bleens, but nevertheless...
(not really sure what that means in this context).
I am just saying that confounding could make your effect weaker (if there is cancellation of paths), or stronger (if there is some sort of interaction with the treatment), or weaker sometimes and stronger other times. You just don’t know. Confounding doesn’t just increase the variance of your effect estimate, it creates bias in the estimate. That is, if you add up some confounded bits to your estimate, you are adding up garbage.
Wouldn’t this be covered by randomization?
No. The werewolf example is a clear case of the copies not being exchangeable. Different versions of you could react to (randomized!) treatment differently, and you won’t know how without more assumptions. For instance, if you were a woman, you would have a different hormonal composition due to the monthly cycle, etc. etc. etc.
From the sound of it, you’re largely making the theoretician’s objection: “but there are a billion ways your
simple design could go wrong!”
Look, what I am saying is not very complicated. I am not asking you to become a mathematician. You are looking for causal effects. That’s great! It is not my goal to discourage you! Just report your assumptions. All of them. Say you assume monotonicity, exchangeability of copies, etc. If you don’t know what assumptions you need to make, maybe read up on them. Reporting assumptions is good science, right? It’s standard practice in the stats literature.
No, see. The burden of proof is not on me. If you make an assumption, the burden of proof that it holds (or at the very least the burden of reporting) is on you. Causal mechanisms in general are not monotonic...Just report your assumptions. All of them. Say you assume monotonicity, exchangeability of copies, etc. If you don’t know what assumptions you need to make, maybe read up on them.
This is an example of what I mean by you are taking a wildly impractical theoretical approach. Have you ever seen an experiment in which every assumption is reported with a proof? No, because such a paper would not be an experiment but an exercise in pure mathematics or statistics and no one would ever get anything done if they tried to actually apply your suggestions since they would spend all their time reading up on various statistical frameworks and going ‘well, I guess I should specify this and that assumption but wait don’t I also assume independence of who’s the current Justice of the Supreme Court?’ etc
But don’t just assume some random thing you came up with after reading some slice of the literature that happened to catch your fancy will give you the effect you want.
I hate to break it to you, but that’s pretty much how it works. People read a slice of the literature, apply simple common models, which yield reasonable answers, and only start delving into the foundations and examining closely the methods if someone makes a good case that a hidden assumption or a method’s limitation is important. This should not dismay you any more than a philosopher of science should be dismayed that scientists spend their days in the lab and he is only consulted to deal with borderline cases like Intelligent Design.
Reporting assumptions is standard practice. For example in causal inference literature the mantra is often “we assume SUTVA (stable unit treatment value assumption), and conditional ignorability.” You can’t prove them all (in fact many are untestable). Reporting is still a good idea (for sensitivity analysis, replication, arguing about their reasonableness, etc.)
Exchangeability of copies and monotonicity are pretty important. People always report monotonicity (because you get identification when you could not before). But anyways, I shouldn’t be the one to have to tell you this.
Also, it’s not some, it’s all assumptions needed to get your answer from the data. Even if exchangeability holds for you, it might not hold for someone else who might want to try your design. If you don’t write down what you assume, how should they know if your design will carry over?
Anyways, this is just the Scruffy AI mistake all over again. Actually it’s worse than that. The scientific attitude is to try to falsify, e.g. look for reasons your model might fail. You are assuming as a default that your model is reasonable, and not even leaving a paper trail.
Dozens of fields are concerned with “identifying causal effects from data”, pretty much all the natural sciences and all their myriad subspecializations can be viewed through such a lense. That’s the crux, can be viewed as such. Yet, I doubt you’ll find all that many medical studies, physical experiments, etc. invoking, understanding or even being aware of do-calculus. That does not void their results, there are ways of interpreting the results that do not rely on grasping—or even be aware of—the math behind the curtain.
A biologist can make valid observations about a meadow without being concerned about wave functions; gwern can do internally valid studies without being concerned about the math of do-calculus. Thankfully, or else nothing would get done. Like, ever.
It’s nice to be enthusiastic about what you do, but be careful of an apotheosis of your specific field of study.
Dozens of fields are concerned with “identifying causal effects from data”, pretty much all the natural
sciences and all their myriad subspecializations can be viewed through such a lense.
Indeed.
That’s the crux, can be viewed as such. Yet, I doubt you’ll find all that many medical studies, physical
experiments, etc. invoking do-calculus. That does not void their results, there are ways of interpreting the
results that do not rely on grasping—or even be aware of—the math behind the curtain.
“That’s just like, your opinion, man.”
See, you don’t get to say that. When people talk about causal effects from randomization (a la what Fisher talked about), effects of interventions is what they mean. That is the math behind what they want, just like complex valued matrices is the math behind quantum mechanics, or Peano axioms the math behind doing arithmetic. Not everyone uses the language of do(.) (some use potential outcome language, which is equivalent). But either their language is equivalent to do(.), or they are essentially doing garbage (and I assure you, there is a lot of garbage out there). In fields like epidemiology, what they often have is the data people (who know about HIV, say, or cancer), and methods people (who know how not to get garbage from the data).
The fact of the matter is, there are all sorts of gotchas about doing causal inference that being careless and relying on intuitions makes you vulnerable to. I can give endless examples:
(a) People doing longitudinal causal inference basically failed at time-varying confounders until 1986, when the right method was developed. So they would report garbage causal effects from longitudinal studies, because they thought they just need to adjust for these confounders. No. Wrong. Have to use the equivalent of g-computation.
(b) People try to use coefficients of regressions as mediated causal effects, even when this is not warranted (that is, the coefficient doesn’t correspond to anything causal). No. Wrong. This fails if you have discrete mediators. This fails with interaction terms. This fails under certain natural modeling choices. This fails if you have unobserved confounding. In general a mediated effect is a complicated function of the observed data, not a regression coefficient.
(c) People try to test for causal null, even when their model does not permit the null to happen. (null paradox)
(d) Don Rubin (famous Harvard statistician, one of the people who wrote down the EM algorithm, and one of the people behind potential outcomes) once said that you should adjust for all covariates. He was just trying to be a good Bayesian (have to use all the data, right?) No. Wrong. You only adjust for what you need to block all non-causal paths, while not opening any non-causal paths.
(e) An example from something written at lesswrong: a Bayesian network is a causal model. No. Wrong. A Bayesian network is a statistical model (a set of densities) defined by conditional independence. In order to have a causal model you need to talk about how interventions relate to observations (essentially you need to say parents are direct causes formally).
Actually the list is so long, I am trying to put it in a paper format.
This stuff is not simple, and even very smart people can be confused! So if you want to do causal inference, you know, read up on it.. I am surprised this is a controversial point. To quote Miguel Hernan, the g-formula (expressing do(.) in terms of observed data) is not a causal method, it is the causal method.
If you don’t want to read Pearl, you can read Robins, or Dawid, or the potential outcomes people who learned from Rubin. The formalism is the same.
I don’t think this is a good way to think about confounding. For one thing, you are implicitly assuming the effect is monotonic. Perhaps this is true with nootropics (how do you know though?) Monotonicity is not true in general, though. Maybe treatments and unwashed out partial treatments interact in weird/random ways. In general, if you are adding up unconfounded and confounded days, your sum is garbage, not a weaker version of the true sum.
A u-curve response is just one type of non-monotonic response. There could be others. I don’t think it’s entirely scientific to assume either the function is monotonic or it has a monotonic first derivative.. What if there is no simple way to describe the response?
Actually I am not even talking about the response to the treatment. Suppose you were a werewolf, and the outcome you were measuring was a physical test. Now, every few days out of 28 you would measure off the charts completely independently of whatever physical enhancement treatment you were taking, just because you were half-wolf during those days. So you might conclude there is an effect under the null. Now werewolves do not exist, but are you sure this sort of thing doesn’t happen with you? How do you know?
I think that’s a curious attitude for someone who is into self-experimentation (independently of whether the opaque objection can be made clear or not). In some sense, do-calculus is the math behind identifying causal effects from data. I am not sure how you can talk about these things with any confidence without reading up on the math. It’s like being a practicing consequentialist without knowing some decision theory. You can’t just rely on intuition.
I think at the very least you should write down all the assumptions you are making in order to have your conclusions be internally valid.
I haven’t compiled my results into a table or anything but IIRC, I think the largest effect size so far was taking vitamin D at bedtime with d~=-0.7. (Roughly inline with psychology meta-analyses: effect sizes drop off sharply past |0.6|.)
The background research and published experiments don’t seem to include unusual adjustments for non-monotonicity (not really sure what that means in this context).
In general? Do you have a meta-analysis over hundreds of different kinds of experiments showing this?
Wouldn’t this be covered by randomization? If I randomize each day to this treatment, half of the wolf-days will be under treatment days and half under control days. They’ll inflate the standard deviation and I’ll be much less likely to reject the null.
From the sound of it, you’re largely making the theoretician’s objection: “but there are a billion ways your simple design could go wrong! How can you do any experiments if you don’t understand in detail every underlying tool or theorem?” Well, yes, it’s true that I nor other experimenters can’t rule out becoming a werewolf on every 5th Tuesday or in setting up an experiment with completely wrong blocks or washouts, nor can we be sure that induction will continue to work tomorrow and we will not be eaten by grues or bleens, but nevertheless...
I am just saying that confounding could make your effect weaker (if there is cancellation of paths), or stronger (if there is some sort of interaction with the treatment), or weaker sometimes and stronger other times. You just don’t know. Confounding doesn’t just increase the variance of your effect estimate, it creates bias in the estimate. That is, if you add up some confounded bits to your estimate, you are adding up garbage.
No. The werewolf example is a clear case of the copies not being exchangeable. Different versions of you could react to (randomized!) treatment differently, and you won’t know how without more assumptions. For instance, if you were a woman, you would have a different hormonal composition due to the monthly cycle, etc. etc. etc.
Look, what I am saying is not very complicated. I am not asking you to become a mathematician. You are looking for causal effects. That’s great! It is not my goal to discourage you! Just report your assumptions. All of them. Say you assume monotonicity, exchangeability of copies, etc. If you don’t know what assumptions you need to make, maybe read up on them. Reporting assumptions is good science, right? It’s standard practice in the stats literature.
This is an example of what I mean by you are taking a wildly impractical theoretical approach. Have you ever seen an experiment in which every assumption is reported with a proof? No, because such a paper would not be an experiment but an exercise in pure mathematics or statistics and no one would ever get anything done if they tried to actually apply your suggestions since they would spend all their time reading up on various statistical frameworks and going ‘well, I guess I should specify this and that assumption but wait don’t I also assume independence of who’s the current Justice of the Supreme Court?’ etc
I hate to break it to you, but that’s pretty much how it works. People read a slice of the literature, apply simple common models, which yield reasonable answers, and only start delving into the foundations and examining closely the methods if someone makes a good case that a hidden assumption or a method’s limitation is important. This should not dismay you any more than a philosopher of science should be dismayed that scientists spend their days in the lab and he is only consulted to deal with borderline cases like Intelligent Design.
Reporting assumptions is standard practice. For example in causal inference literature the mantra is often “we assume SUTVA (stable unit treatment value assumption), and conditional ignorability.” You can’t prove them all (in fact many are untestable). Reporting is still a good idea (for sensitivity analysis, replication, arguing about their reasonableness, etc.)
That’s reporting some assumptions, and presumably ones who have earned their being specifically singled out.
Exchangeability of copies and monotonicity are pretty important. People always report monotonicity (because you get identification when you could not before). But anyways, I shouldn’t be the one to have to tell you this.
Also, it’s not some, it’s all assumptions needed to get your answer from the data. Even if exchangeability holds for you, it might not hold for someone else who might want to try your design. If you don’t write down what you assume, how should they know if your design will carry over?
Anyways, this is just the Scruffy AI mistake all over again. Actually it’s worse than that. The scientific attitude is to try to falsify, e.g. look for reasons your model might fail. You are assuming as a default that your model is reasonable, and not even leaving a paper trail.
Dozens of fields are concerned with “identifying causal effects from data”, pretty much all the natural sciences and all their myriad subspecializations can be viewed through such a lense. That’s the crux, can be viewed as such. Yet, I doubt you’ll find all that many medical studies, physical experiments, etc. invoking, understanding or even being aware of do-calculus. That does not void their results, there are ways of interpreting the results that do not rely on grasping—or even be aware of—the math behind the curtain.
A biologist can make valid observations about a meadow without being concerned about wave functions; gwern can do internally valid studies without being concerned about the math of do-calculus. Thankfully, or else nothing would get done. Like, ever.
It’s nice to be enthusiastic about what you do, but be careful of an apotheosis of your specific field of study.
Indeed.
“That’s just like, your opinion, man.”
See, you don’t get to say that. When people talk about causal effects from randomization (a la what Fisher talked about), effects of interventions is what they mean. That is the math behind what they want, just like complex valued matrices is the math behind quantum mechanics, or Peano axioms the math behind doing arithmetic. Not everyone uses the language of do(.) (some use potential outcome language, which is equivalent). But either their language is equivalent to do(.), or they are essentially doing garbage (and I assure you, there is a lot of garbage out there). In fields like epidemiology, what they often have is the data people (who know about HIV, say, or cancer), and methods people (who know how not to get garbage from the data).
The fact of the matter is, there are all sorts of gotchas about doing causal inference that being careless and relying on intuitions makes you vulnerable to. I can give endless examples:
(a) People doing longitudinal causal inference basically failed at time-varying confounders until 1986, when the right method was developed. So they would report garbage causal effects from longitudinal studies, because they thought they just need to adjust for these confounders. No. Wrong. Have to use the equivalent of g-computation.
(b) People try to use coefficients of regressions as mediated causal effects, even when this is not warranted (that is, the coefficient doesn’t correspond to anything causal). No. Wrong. This fails if you have discrete mediators. This fails with interaction terms. This fails under certain natural modeling choices. This fails if you have unobserved confounding. In general a mediated effect is a complicated function of the observed data, not a regression coefficient.
(c) People try to test for causal null, even when their model does not permit the null to happen. (null paradox)
(d) Don Rubin (famous Harvard statistician, one of the people who wrote down the EM algorithm, and one of the people behind potential outcomes) once said that you should adjust for all covariates. He was just trying to be a good Bayesian (have to use all the data, right?) No. Wrong. You only adjust for what you need to block all non-causal paths, while not opening any non-causal paths.
(e) An example from something written at lesswrong: a Bayesian network is a causal model. No. Wrong. A Bayesian network is a statistical model (a set of densities) defined by conditional independence. In order to have a causal model you need to talk about how interventions relate to observations (essentially you need to say parents are direct causes formally).
Actually the list is so long, I am trying to put it in a paper format.
This stuff is not simple, and even very smart people can be confused! So if you want to do causal inference, you know, read up on it.. I am surprised this is a controversial point. To quote Miguel Hernan, the g-formula (expressing do(.) in terms of observed data) is not a causal method, it is the causal method.
If you don’t want to read Pearl, you can read Robins, or Dawid, or the potential outcomes people who learned from Rubin. The formalism is the same.