For a while, I’ve been convinced that decaf coffee has roughly the same effect on me as regular coffee. However, I haven’t been able to say with certainty because there’s huge potential for placebo effects. Starting tomorrow, I’ll be conducting a (99%) blinded experiment to test whether drinking regular vs. decaf coffee has a detectable effect on my mood and alertness.
I intend to record the metrics I describe in the next section under the ‘Data collection’ heading of this post and will also report results here once the experiment is over. I’ll make data from my Quantified Mind experiment (discussed below) available as well in CSV format.
See something wrong with this experiment plan, my analysis, and/or my results? Email me at first.last+blog-at-gmail.com (see top left header for spelling) or comment on this post!
Experiment
Preparation
To prepare for the experiment, I split 2 weeks’ worth of coffee (7 days of decaf, 7 days of regular) into 14 bags. After I split the coffee into 14 bags, my lab assistant (girlfriend) sorted them into a random order (with labels for each day) by flipping a coin to decide whether that day’s coffee would be regular or decaf.
Procedure
Starting tomorrow, for the next 14 days, I’ll make my coffee using the grounds from the labeled bags and track a few subjective metrics three times a day (right after having coffee, at 1 PM, and at 6 PM):
Alertness (1-5 scale): loosely defined as how tired and sleepy I feel.
Sharpness (1-5 scale): the opposite of ‘fogginess’.
Mood (1-5 scale): a coarse-grained measure of how ‘good’ I feel emotionally.
Headache (yes/no): I get headaches if I don’t have coffee in the morning, so it will be interesting to see whether I get them on decaf days even when I don’t know the coffee is decaf.
Regular vs. decaf day (regular/decaf): “Given how I feel today, do I think this morning’s coffee was regular or decaf?”
<img src=”http://an1lam.github.io/images/coffee_bags.jpg″ style=”width: 400px; height:200 px; clip: rect(0px,-60px,-200px,0px);”>
I’ve also set up a Quantified Mind (QM) experiment which will give me 8 minutes of cognitive tests each day and record my scores. I’ll take these tests immediately upon finishing my coffee in the morning.
Materials
I’m using Swiss Water’s version of Joe Coffee Nightcap Decaf and Joe Coffee Colombia La Familia Guarnizo regular coffee both brewed in a Mr. Coffee drip coffee maker. I chose Swiss Water for decaf after reading that they do the best job of filtering out >99% of the caffeine from the beans.
For cognitive tests, I’m using QM’s 8 minute “coffee” test that includes tests of executive function, working memory, and visuospatial something.
Confounders
Sleep
Getting less than 7 hours of sleep affects how sharp I feel throughout the day and how awake I feel in the afternoon. Even though this is a randomized experiment, 14 days is short enough that were my sleep schedule to get interrupted, noise from sleep quality variance could easily overwhelm the signal from drinking regular vs. decaf coffee in the data. I plan to deal with this in two ways:
Track how much sleep I get each night (in number of hours). Unfortunately, I don’t have a good sleep tracker so this will just be based on estimate of when I went to bed, how long it took me to fall asleep, and when I woke up.
Avoid major sleep disruptions and sleep roughly the same number of hours each night. That said, if 2 fails and my sleep schedule gets messed up during the experiment, I’ll be much less confident in the results.
(ETA on 03⁄02.) In the comments, Bucky points out that caffeine use may also impact sleep the next night, making the confounding even more complex. My current plan is to test for evidence of this when I do my analysis. Pre-registering that I’ll be surprised if my one cup of coffee in the morning has much of an impact, but being surprised is the whole point of something like this!
Diet
Relative to sleep, I’m less convinced that regular dietary variation—i.e. eating relatively ‘healthy’ food and not being moribdly obese—has much of an effect on cognitive performance. But I still will keep my regular eating schedule of skipping breakfast and only having lunch and dinner as I suspect this will also help me keep a regular sleep schedule. To keep myself honest here, I’ll track when I eat each day.
Internet Use
(ETA on 03⁄02.)
I know this one seems weird but anecdotally, I’ve found procrastinating on the more addicting internet website (read: Twitter) causes me to feel a lot fuzzier for the rest of the day.
Caffeine Withdrawal
(ETA on 03⁄02.)
In the comments, Issa Rice points out that caffeine withdrawal may begin anywhere between 12 and 24 hours after not having caffeine and peaks around 50 hours. This means that depending on the order of consecutive days, I may or may not go through full withdrawal, which would presumably impact my results.
Mood
(ETA on 03⁄02.)
In the comments, Pattern points out that mood and events that affect it might affect the results. I’ve added a mood metric to my list of subjective metrics to track to prepare for this possibility.
Cognitive test practice effects
Given that I haven’t been doing QM tests before the experiment to calibrate, there’s a risk of practice effects dominating differences between caffeine and no caffeine days. I’m not totally sure how to deal with this yet, but isn’t this the use-case for random effects regressions?
Analysis
On not pre-registering in detail
Since I’ve been reading Gelman’s wonderful Bayesian Data Analysis and also view this study as a good candidate for a Bayesian approach due to the experiment having a small , I intend to use Bayesian methods for my analysis. In an ideal world, I’d pre-register exactly what analyses I intend to do now (as of 03⁄01), but unfortunately, I’m still enough of a noob at this that I need to spend a good chunk of time reading about the right way to set up the analysis. For now, I’m recording the questions I want to answer below and will edit to add details of the analysis as I figure them out.
I worry less than I normally would about post-hoc changing the analysis to find a significant result because I don’t have strong incentives to find one. That is, I’m genuinely interested in the ‘true’ answer to the question and don’t have a strong desire for it to be ‘there’s a big effect’ or ‘there’s no effect’. Being transparent about the results of each stage of analysis should also help keep me honest. (Of course, I could always post-hoc choose not to share intermediate stages but again I don’t think my incentives are to do that.)
High level plan
At a high level, I want to test the effect of regular vs. decaf coffee on alertness, sharpness, headaches, and my QM results. This is complicated by the fact that my prior is that the response variables I described above only share some common causes and that the causal effects of caffeine consumption differ between the response variables. For example, I suspect alertness and QM test scores are both affected by sleep quantity and coffee consumption but that alertness may also be impacted by other confounding variables like mood and plans for the day.
To mitigate this, I’ll heavily rely on the most objective response variable, the QM results, to determine the magnitude of the ‘true effect’. In causal terms, this is equivalent to assuming that sleep is sufficient for blocking all ‘backdoor’ paths between regular vs. decaf coffee and cognitive ability. I’m still measuring the other subjective variables because I’m curious to see how correlated they are with my QM results and each other and other want to leave open the possibility of doing other analyses that come to mind and seem interesting.
FAQ
This is currently (as of 03⁄01) a list of questions that I came up with for myself, but I’ll also add answers to questions others raise in this section.
Isn’t this too short?
As I mentioned, 14 days is short enough that even though the regular vs. decaf day assignments are randomized and blinded, the ‘statistical power’ of my results will be relatively weak. Two responses to this:
From a decision-theoretic perspective, I mostly care about the easier to answer question of was the effect meaningful enough that I could accurately detect whether the coffee I had that day was regular or decaf conditional on what I know about my sleep and other factors.
I’m going to use Bayesian methods and will be more than willing to label the results ‘inconclusive’ if my analysis results in a diffuse posterior.
Why ’99%′ blinded?
I’m calling this 99% blinded because there is a slight visual difference between the two coffee grounds that I could in theory detect while making my morning coffee. By making my coffee in the dark (I do this already) and having the bags pre-sorted so I barely have to look at them, I hope to minimize the likelihood of ‘de-blinding’ the experiment. I tried to minimize the likelihood further by buying identical decaf and regular grounds but unfortunately couldn’t find a seller that sold the same beans in decaf and regular. In lieu of that, I settled for buying beans from the same region with the same flavor profile (I also don’t have very good taste sense) so as to limit the difference to a visual one.
Data Collection
Recording subjective metrics and sleep duration in this Google spreadsheet (to make export to CSV easy).
Below, I’m also recording miscellaneous observations.
Observations
Day 7 (03/09)
Quantified Mind Practice Effects
My Quantified Mind results are definitely improving in large part due to practice effects. This is in spite of my trying to use the same strategies for the different tests rather than improve them. For example, there’s a test in which I have to select a number between 1 and 9 based on a picture and on the first day I set up my hands such that my pinky was on the 0 (which isn’t an option in the test). This positioning is unnatural for me and in hindsight I should have started with my pinky on the 9. But, to keep things consistent and prevent unnecessary confounding, I’ve stuck with my original hand positioning for all subsequent tests.
Results
Qualitative Observations
I’m done! Made it through the withdrawal headaches. I haven’t done much analysis yet but here are a few of my initial observations, some of which I won’t be able to verify with analysis.
I did pretty well at identifying which days were caffeine vs. decaf days. I only made two mistakes and one of them I had a hunch I was wrong in hindsight.
Decaf days affected my actual subjective productivity less than expected. The main beneficial effect of caffeine seemed to be that it lowered the activation energy for me to get started on tasks and on days in which I’d slept well seems to add a certain ‘sharp’ quality to my thinking.
Sleep matters. This I’m hopeful I’ll be able to get at least some signal on. Anecdotally, especially if we ignore the headaches (which were a result of withdrawal not drinking decaf coffee in general), the difference in all my subjective metrics seemed to correlate much more with how much sleep I got before than with regular vs. decaf coffee.
Caffeine may not help me do better when sleep deprived. As mentioned above, I do notice a small subjective positive effect on my ‘sharpness’ when I sleep really well, have caffeine, and fast (which I do most days until lunch). On the other hand, on days on which I got <7 hours of sleep (happened before both caffeine and decaf days), I felt like caffeine either made no difference or made me a bit more awake at the cost of making my cognition even fuzzier. I highly doubt this will show up in the Quantified Mind metrics in any detectable way but I wanted to note it as a hypothesis that I’ve very interested in as part of my general interest in mitigating the effects of sleep deprivation.
Credit to Issa Rice for pointing out that this would be an issue when I proposed the experiment. Withdrawal did turn out to be a bit of an issue although not enough of one (IMO) to mess up the results of the experiment. My first decaf sequence was two days in a row and in the afternoon I got a bad withdrawal headache that was resistant to Ibuprofen. On later decaf days, I took Ibuprofen at the first sign of a headache and this seemed to largely mitigate withdrawal symptoms. Of course this does confound my headache tracking a bit, but I view it as worth it in order to try minimize the effect of withdrawal on other metrics.
(Where I’ll record graphs and other summary statistics.)
It’s nice to see an experiment post*, I haven’t seen a lot of these. I think it’s really cool.
*Perhaps LW usually leans too much towards theory, or people are doing experiments but not writing them up.
If you’re worried about analysis you could try explaining your model/experiences in more detail, or collecting data about more variables.
Model example: you didn’t (in this post( consider the possibility in advance that there might exist both types of beans that work as good decaf as not, and beans that don’t, or that some kinds of beans are better than others. I have no reason to believe this is the case, but explicit assumptions might be useful, if only for later experiences. Depending on how you do this experiment, you could in theory find out that you like one kind of coffee/bean more than the other taste wise, even if they have the same effect on alertness.* This brings me to my second example:
*This would require finding out after the fact which coffee was had on which day.
Collecting more data (that could affect what you’re measuring) example: suppose there are other things that could affect your mood or alertness. Writing about these other factors could be useful. (Intuitively, if you got some surprising and really bad/really good news, and this was independent of which type of coffee you had that morning, but has a big impact on your mood, then that might be a good thing to note. Similarly, smaller things** could in theory have an impact on mood or alertness data, though the smaller the effect, the lower the risk of reversing the conclusion incorrectly.)
**Had breakfast, skipped breakfast, went for a walk, etc.
I’ve noticed that for me, caffeine withdrawal really begins (and is worst) on the second day I stop drinking coffee. In your experiment, if the coin flips went something like regular, decaf, regular, decaf, …, then I don’t think I would notice a huge difference between the regular and decaf days (despite there being a very noticeable difference between drinking coffee after abstinence, caffeine withdrawal, and a regular sober/caffeinated day).
Here is a random article which says “Typically, onset of [caffeine withdrawal] symptoms occurred 12–24 h after abstinence, with peak intensity at 20–51 h, and for a duration of 2–9 days.” (I haven’t looked at this article in detail, so I don’t know how good the science is.)
My suggestion would be to use larger “blocks” of days (e.g. 3-day blocks) so that caffeine withdrawal/introduction becomes more obvious. Maybe the easiest would be to drink the same grounds for a week (flipping a coin once to determine which to start with).
Thanks for your comment. I actually thought about the withdrawal point after posting this but before seeing your comment, but didn’t have the (good) idea of using blocks to mitigate it. I’m now pretty uncertain about whether blocks would be better or not. The rest of this comment should be read from the perspective of me thinking out loud not an authoritative response.
From a practical perspective, I don’t want to go through withdrawal more than once in two weeks (because from my prior experience, it will be horrible).
From an experiment perspective though, I actually am more interested in the question of “conditional on being addicted to caffeinated coffee, does my body detect the difference between regular and decaf?” I’m also interested in whether, conditional on not being addicted to coffee, drinking caffeinated coffee enhances my performance, but I question whether trying to answer that as part of the same study makes sense. Given that, doing blocks would be good because it would isolate the withdrawal period but bad insofar as it would reduce my samples of “conditional on addiction, do I notice” under different conditions.
One other risk of confounding would be the days not being truly independent of each other. If, for example, caffeine consumed on one day were to affect your sleep quality that night and thus your alertness the next day.
Good point, I also realized that the sleep deprivation as a lagging indicator issue makes things more complicated. That is, there’s some anecdotal evidence (and maybe experimental) that sleep deprivation affects performance not the day after poor sleep but the day after that.
This is great.
One small improvement on the experimental method would be to get someone who you don’t spend much time with to do the randomising. If your gf knows which days are caffeinated she might give subconscious clues. This probably won’t be a big deal in reality but you’re potentially losing some of your blinding.
Thanks! She ordered the bags and gave them numbers but is typically gone by the time I have coffee (we are on different schedules), so this hopefully shouldn’t be a huge issue. That said, this is a good point in the sense that I will explicitly not discuss how I felt that day with respect to the coffee.
Replying to your point in a separate comment but will add (with a cite to you) a note about this in the 99% blinded section.