If I naïvely say that Week 1 establishes a true distribution for averaged weekly counts, then being more than 1σ above the mean for three weeks would have a probability of about p = (0.16)^3 = 0.0041 if that true count distribution remained constant.
Unfortunately, this p-value is poorly calibrated because the sampling errors in the estimates of the weekly means and σ are non-negligible compared to the value of σ.* We can obtain an accurate p-value by simulation. Under the null hypothesis of no change in counting frequency, the count for each day follows a Poisson distribution with mean = 150 counts / 35 days (I got 150 from adding up all the counts in the plot; there is some sampling error in this estimate, but its effect on the estimated p-value is negligible). From simulating 10^5 samples, I found 8296 samples with week 3–5 means that are all greater than the sum of the week 1 mean and week 1 SD. This gives p = 0.083.
An alternative (and standard) way to get a p-value here is to use Kendall tau as a test statistic, which gives a non-parametric rank-based test for monotone association. The single-tailed Kendall tau gives p = 0.076.
* ETA: Let me add more explanation for any reader who is not sure what’s going on. The p-value in the post is (exactly) correct if the weekly mean and SD under the null hypothesis can be determined without any error. Unfortunately, we cannot do that—the best we can do is to estimate the weekly mean and SD using the week 1 mean and SD, so our estimates contain sampling errors. Often, we do not care about sampling errors when we are working with large samples because these errors are negligibly small compared to the SD. However, in this case, our sample has only n = 7, so sampling errors are non-negligible compared to the SD. This becomes a problem when we work with p-values because the null hypothesis is dependent on our estimates, but the errors in these estimates are not taken into consideration when we calculate the p-value. A common way to work around this is to use simulations, as I did. Alternatively, because our null hypothesis is rather simple, it might be feasible to use analytic methods to calculate a correct p-value.
Thanks! Hardly a nitpick, I should really know better. It looks especially bad that my laziness/carelessness led to overstated results. 150 is the correct number of counts, and I agree with your calculation. Embarrassingly, I also screwed up the p-value for the sleep correlation, [EDIT] which I retracted briefly but now have fixed.
This is a well-written post! Upvoted.
A nitpick:
Unfortunately, this p-value is poorly calibrated because the sampling errors in the estimates of the weekly means and σ are non-negligible compared to the value of σ.* We can obtain an accurate p-value by simulation. Under the null hypothesis of no change in counting frequency, the count for each day follows a Poisson distribution with mean = 150 counts / 35 days (I got 150 from adding up all the counts in the plot; there is some sampling error in this estimate, but its effect on the estimated p-value is negligible). From simulating 10^5 samples, I found 8296 samples with week 3–5 means that are all greater than the sum of the week 1 mean and week 1 SD. This gives p = 0.083.
An alternative (and standard) way to get a p-value here is to use Kendall tau as a test statistic, which gives a non-parametric rank-based test for monotone association. The single-tailed Kendall tau gives p = 0.076.
* ETA: Let me add more explanation for any reader who is not sure what’s going on. The p-value in the post is (exactly) correct if the weekly mean and SD under the null hypothesis can be determined without any error. Unfortunately, we cannot do that—the best we can do is to estimate the weekly mean and SD using the week 1 mean and SD, so our estimates contain sampling errors. Often, we do not care about sampling errors when we are working with large samples because these errors are negligibly small compared to the SD. However, in this case, our sample has only n = 7, so sampling errors are non-negligible compared to the SD. This becomes a problem when we work with p-values because the null hypothesis is dependent on our estimates, but the errors in these estimates are not taken into consideration when we calculate the p-value. A common way to work around this is to use simulations, as I did. Alternatively, because our null hypothesis is rather simple, it might be feasible to use analytic methods to calculate a correct p-value.
Thanks! Hardly a nitpick, I should really know better. It looks especially bad that my laziness/carelessness led to overstated results. 150 is the correct number of counts, and I agree with your calculation. Embarrassingly, I also screwed up the p-value for the sleep correlation, [EDIT] which I retracted briefly but now have fixed.