Decaf vs. regular coffee self-experiment

Link post

For a while, I’ve been convinced that decaf coffee has roughly the same effect on me as regular coffee. However, I haven’t been able to say with certainty because there’s huge potential for placebo effects. Starting tomorrow, I’ll be conducting a (99%) blinded experiment to test whether drinking regular vs. decaf coffee has a detectable effect on my mood and alertness.

I intend to record the metrics I describe in the next section under the ‘Data collection’ heading of this post and will also report results here once the experiment is over. I’ll make data from my Quantified Mind experiment (discussed below) available as well in CSV format.

See something wrong with this experiment plan, my analysis, and/or my results? Email me at first.last+blog-at-gmail.com (see top left header for spelling) or comment on this post!

Experiment

Preparation

To prepare for the experiment, I split 2 weeks’ worth of coffee (7 days of decaf, 7 days of regular) into 14 bags. After I split the coffee into 14 bags, my lab assistant (girlfriend) sorted them into a random order (with labels for each day) by flipping a coin to decide whether that day’s coffee would be regular or decaf.

Procedure

Starting tomorrow, for the next 14 days, I’ll make my coffee using the grounds from the labeled bags and track a few subjective metrics three times a day (right after having coffee, at 1 PM, and at 6 PM):

Alertness (1-5 scale): loosely defined as how tired and sleepy I feel.
Sharpness (1-5 scale): the opposite of ‘fogginess’.
Mood (1-5 scale): a coarse-grained measure of how ‘good’ I feel emotionally.
Headache (yes/no): I get headaches if I don’t have coffee in the morning, so it will be interesting to see whether I get them on decaf days even when I don’t know the coffee is decaf.
Regular vs. decaf day (regular/decaf): “Given how I feel today, do I think this morning’s coffee was regular or decaf?”

I’ve also set up a Quantified Mind (QM) experiment which will give me 8 minutes of cognitive tests each day and record my scores. I’ll take these tests immediately upon finishing my coffee in the morning.

Materials

I’m using Swiss Water’s version of Joe Coffee Nightcap Decaf and Joe Coffee Colombia La Familia Guarnizo regular coffee both brewed in a Mr. Coffee drip coffee maker. I chose Swiss Water for decaf after reading that they do the best job of filtering out >99% of the caffeine from the beans.

For cognitive tests, I’m using QM’s 8 minute “coffee” test that includes tests of executive function, working memory, and visuospatial something.

Confounders

Sleep

Getting less than 7 hours of sleep affects how sharp I feel throughout the day and how awake I feel in the afternoon. Even though this is a randomized experiment, 14 days is short enough that were my sleep schedule to get interrupted, noise from sleep quality variance could easily overwhelm the signal from drinking regular vs. decaf coffee in the data. I plan to deal with this in two ways:

Track how much sleep I get each night (in number of hours). Unfortunately, I don’t have a good sleep tracker so this will just be based on estimate of when I went to bed, how long it took me to fall asleep, and when I woke up.
Avoid major sleep disruptions and sleep roughly the same number of hours each night. That said, if 2 fails and my sleep schedule gets messed up during the experiment, I’ll be much less confident in the results.

(ETA on ⁰³⁄₀₂.) In the comments, Bucky points out that caffeine use may also impact sleep the next night, making the confounding even more complex. My current plan is to test for evidence of this when I do my analysis. Pre-registering that I’ll be surprised if my one cup of coffee in the morning has much of an impact, but being surprised is the whole point of something like this!

Diet

Relative to sleep, I’m less convinced that regular dietary variation—i.e. eating relatively ‘healthy’ food and not being moribdly obese—has much of an effect on cognitive performance. But I still will keep my regular eating schedule of skipping breakfast and only having lunch and dinner as I suspect this will also help me keep a regular sleep schedule. To keep myself honest here, I’ll track when I eat each day.

Internet Use

(ETA on ⁰³⁄₀₂.)

I know this one seems weird but anecdotally, I’ve found procrastinating on the more addicting internet website (read: Twitter) causes me to feel a lot fuzzier for the rest of the day.

Caffeine Withdrawal

(ETA on ⁰³⁄₀₂.)

In the comments, Issa Rice points out that caffeine withdrawal may begin anywhere between 12 and 24 hours after not having caffeine and peaks around 50 hours. This means that depending on the order of consecutive days, I may or may not go through full withdrawal, which would presumably impact my results.

Mood

(ETA on ⁰³⁄₀₂.)

In the comments, Pattern points out that mood and events that affect it might affect the results. I’ve added a mood metric to my list of subjective metrics to track to prepare for this possibility.

Cognitive test practice effects

Given that I haven’t been doing QM tests before the experiment to calibrate, there’s a risk of practice effects dominating differences between caffeine and no caffeine days. I’m not totally sure how to deal with this yet, but isn’t this the use-case for random effects regressions?

Analysis

On not pre-registering in detail

Since I’ve been reading Gelman’s wonderful Bayesian Data Analysis and also view this study as a good candidate for a Bayesian approach due to the experiment having a small $n$ , I intend to use Bayesian methods for my analysis. In an ideal world, I’d pre-register exactly what analyses I intend to do now (as of ⁰³⁄₀₁), but unfortunately, I’m still enough of a noob at this that I need to spend a good chunk of time reading about the right way to set up the analysis. For now, I’m recording the questions I want to answer below and will edit to add details of the analysis as I figure them out.

I worry less than I normally would about post-hoc changing the analysis to find a significant result because I don’t have strong incentives to find one. That is, I’m genuinely interested in the ‘true’ answer to the question and don’t have a strong desire for it to be ‘there’s a big effect’ or ‘there’s no effect’. Being transparent about the results of each stage of analysis should also help keep me honest. (Of course, I could always post-hoc choose not to share intermediate stages but again I don’t think my incentives are to do that.)

High level plan

At a high level, I want to test the effect of regular vs. decaf coffee on alertness, sharpness, headaches, and my QM results. This is complicated by the fact that my prior is that the response variables I described above only share some common causes and that the causal effects of caffeine consumption differ between the response variables. For example, I suspect alertness and QM test scores are both affected by sleep quantity and coffee consumption but that alertness may also be impacted by other confounding variables like mood and plans for the day.

To mitigate this, I’ll heavily rely on the most objective response variable, the QM results, to determine the magnitude of the ‘true effect’. In causal terms, this is equivalent to assuming that sleep is sufficient for blocking all ‘backdoor’ paths between regular vs. decaf coffee and cognitive ability. I’m still measuring the other subjective variables because I’m curious to see how correlated they are with my QM results and each other and other want to leave open the possibility of doing other analyses that come to mind and seem interesting.

FAQ

This is currently (as of ⁰³⁄₀₁) a list of questions that I came up with for myself, but I’ll also add answers to questions others raise in this section.

Isn’t this too short?

As I mentioned, 14 days is short enough that even though the regular vs. decaf day assignments are randomized and blinded, the ‘statistical power’ of my results will be relatively weak. Two responses to this:

From a decision-theoretic perspective, I mostly care about the easier to answer question of was the effect meaningful enough that I could accurately detect whether the coffee I had that day was regular or decaf conditional on what I know about my sleep and other factors.
I’m going to use Bayesian methods and will be more than willing to label the results ‘inconclusive’ if my analysis results in a diffuse posterior.

Why ’99%′ blinded?

I’m calling this 99% blinded because there is a slight visual difference between the two coffee grounds that I could in theory detect while making my morning coffee. By making my coffee in the dark (I do this already) and having the bags pre-sorted so I barely have to look at them, I hope to minimize the likelihood of ‘de-blinding’ the experiment. I tried to minimize the likelihood further by buying identical decaf and regular grounds but unfortunately couldn’t find a seller that sold the same beans in decaf and regular. In lieu of that, I settled for buying beans from the same region with the same flavor profile (I also don’t have very good taste sense) so as to limit the difference to a visual one.

Data Collection

Recording subjective metrics and sleep duration in this Google spreadsheet (to make export to CSV easy).

Below, I’m also recording miscellaneous observations.

Observations

Day 7 (03/09)

Quantified Mind Practice Effects

My Quantified Mind results are definitely improving in large part due to practice effects. This is in spite of my trying to use the same strategies for the different tests rather than improve them. For example, there’s a test in which I have to select a number between 1 and 9 based on a picture and on the first day I set up my hands such that my pinky was on the 0 (which isn’t an option in the test). This positioning is unnatural for me and in hindsight I should have started with my pinky on the 9. But, to keep things consistent and prevent unnecessary confounding, I’ve stuck with my original hand positioning for all subsequent tests.

Results

Qualitative Observations

I’m done! Made it through the withdrawal headaches. I haven’t done much analysis yet but here are a few of my initial observations, some of which I won’t be able to verify with analysis.

I did pretty well at identifying which days were caffeine vs. decaf days. I only made two mistakes and one of them I had a hunch I was wrong in hindsight.
Decaf days affected my actual subjective productivity less than expected. The main beneficial effect of caffeine seemed to be that it lowered the activation energy for me to get started on tasks and on days in which I’d slept well seems to add a certain ‘sharp’ quality to my thinking.
Sleep matters. This I’m hopeful I’ll be able to get at least some signal on. Anecdotally, especially if we ignore the headaches (which were a result of withdrawal not drinking decaf coffee in general), the difference in all my subjective metrics seemed to correlate much more with how much sleep I got before than with regular vs. decaf coffee.
Caffeine may not help me do better when sleep deprived. As mentioned above, I do notice a small subjective positive effect on my ‘sharpness’ when I sleep really well, have caffeine, and fast (which I do most days until lunch). On the other hand, on days on which I got <7 hours of sleep (happened before both caffeine and decaf days), I felt like caffeine either made no difference or made me a bit more awake at the cost of making my cognition even fuzzier. I highly doubt this will show up in the Quantified Mind metrics in any detectable way but I wanted to note it as a hypothesis that I’ve very interested in as part of my general interest in mitigating the effects of sleep deprivation.
Credit to Issa Rice for pointing out that this would be an issue when I proposed the experiment. Withdrawal did turn out to be a bit of an issue although not enough of one (IMO) to mess up the results of the experiment. My first decaf sequence was two days in a row and in the afternoon I got a bad withdrawal headache that was resistant to Ibuprofen. On later decaf days, I took Ibuprofen at the first sign of a headache and this seemed to largely mitigate withdrawal symptoms. Of course this does confound my headache tracking a bit, but I view it as worth it in order to try minimize the effect of withdrawal on other metrics.

(Where I’ll record graphs and other summary statistics.)

Decaf vs. regular coffee self-experiment

Experiment

Preparation

Procedure

Materials

Confounders

Sleep

Diet

Internet Use

Caffeine Withdrawal

Mood

Cognitive test practice effects

Analysis

On not pre-registering in detail

High level plan

FAQ

Isn’t this too short?

Why ’99%′ blinded?

Data Collection

Observations

Day 7 (03/​09)

Quantified Mind Practice Effects

Results

Qualitative Observations

Day 7 (03/09)