What’s a good Bayesian alternative to statistical significance testing? For example, if I look over my company’s email data to figure out what the best time of the week to send someone an email is, and I’ve got all possible hours of the week ordered by highest open rate to lowest open rate, how can I get a sense of whether I’m looking at a real effect or just noise?
In that scenario, how much does it really matter? It’s free to send email at one time of week rather than another, so your only cost is the opportunity cost of picking a bad time to email people, which doesn’t seem likely to be too big.
Our email send by the hour would get far lumpier, so we would have to add more servers in order to handle a much higher peak emails sent per minute. And it takes development effort to configure emails to send at an intelligent time based on the user’s timezone.
OK, here’s a proposed solution I came up with. Start with the overall open rate for all emails regardless of time of the week. Use that number, and your intuition for how much variation you are likely to see between different days and times (perhaps informed by studies on this subject that people have already done) to construct some prior distribution over the open probabilities you think you’re likely to see. You’ll want to choose a distribution over the interval (0, 1) only… I’m not sure if this one or this one is better in this particular case. Then for each hour of the week, use maximum-a-posteriori estimation (this seems like a brief & good explanation) to determine the mode of the posterior distribution, after you’ve updated on all of the open data you’ve observed. (This provides an explanation of how to do this.) The mode of an hour’s distribution is your probability estimate that an email sent during that particular hour of the week will be opened.
Given those probability estimates, you can figure out how many opens you’d get if emails were allocated optimally throughout the week vs how many opens you’d get if they were allocated randomly and figure out if optimal allocation would be worthwhile to set up.
Not Bayesian, but can’t you just do ANOVA w/ the non-summarized time of day vs. open rate (using hourly buckets)? That seems like a good first-pass way of telling whether or not there’s an actual difference there. I confess that my stats knowledge is really just from natural sciences experiment-design parts of lab classes, so I have a bias towards frequentist look-up-in-a-table techniques just because they’re what I’ve used.
Rant for a different day, but I think physics/engineering students really get screwed in terms of learning just enough stats/programming to be dangerous. (I.e., you’re just sort of expected to know and use them one day in class, and get told just enough to get by- especially numerical computing and C/Fortran/Matlab).
Suppose you have three hypotheses:
(1) It’s better to email in the morning
(2) It’s better to email in the evening
(3) They’re equally good
Why do you care about (3)? If you’re just deciding whether to email in the morning or evening, (3) is irrelevant to ranking those two options.
The full-fledged Bayesian approach would be to identify the hypotheses (I’ve simplified it by reducing it down to just three), decide what your priors are, calculate the probability of seeing the data under each of the hypotheses, and then combing that data according to the Bayesian formula to find the posterior probability. However, you don’t have to run through the math to see that if your prior for (1) and (2) are equal, and the sample is skewed towards evening, then the posterior for (2) will be larger than the posterior for (1).
The only time you’d actually have to run through the math is if your priors weren’t equal, and you’re trying to decide whether the additional data is enough to overcome the difference in the priors, or if you have some consideration other than just choosing between morning or evening (for instance, you might find it more convenient to just email when you first have something to email about, in which case you’re choosing between “email in morning”, “email in evening” and “email whenever it’s convenient to me”).
“Statistical significance” is just a shorthand to avoid having to actually doing a Bayesian calculation. For instance, suppose we’re trying to decide whether a study showing that a drug is effective is statistically significant. If the only two choices were “take the drug” and “don’t take the drug”, and we were truly indifferent between those two options, the issue of significance wouldn’t even matter. We should just take the drug. The reason we care about whether the test is significant is because we aren’t indifferent to the two choices (we have a bias towards the status quo of not taking the drug, making the drug would cost money, there are probably going to be side effects of the drug, etc.) and there are other options (take another drug, have more drug trials, etc.) When a level of statistical significance is chosen, an implicit statement is being made about how much weight is being given towards the status quo.
What’s a good Bayesian alternative to statistical significance testing? For example, if I look over my company’s email data to figure out what the best time of the week to send someone an email is, and I’ve got all possible hours of the week ordered by highest open rate to lowest open rate, how can I get a sense of whether I’m looking at a real effect or just noise?
In that scenario, how much does it really matter? It’s free to send email at one time of week rather than another, so your only cost is the opportunity cost of picking a bad time to email people, which doesn’t seem likely to be too big.
Our email send by the hour would get far lumpier, so we would have to add more servers in order to handle a much higher peak emails sent per minute. And it takes development effort to configure emails to send at an intelligent time based on the user’s timezone.
OK, here’s a proposed solution I came up with. Start with the overall open rate for all emails regardless of time of the week. Use that number, and your intuition for how much variation you are likely to see between different days and times (perhaps informed by studies on this subject that people have already done) to construct some prior distribution over the open probabilities you think you’re likely to see. You’ll want to choose a distribution over the interval (0, 1) only… I’m not sure if this one or this one is better in this particular case. Then for each hour of the week, use maximum-a-posteriori estimation (this seems like a brief & good explanation) to determine the mode of the posterior distribution, after you’ve updated on all of the open data you’ve observed. (This provides an explanation of how to do this.) The mode of an hour’s distribution is your probability estimate that an email sent during that particular hour of the week will be opened.
Given those probability estimates, you can figure out how many opens you’d get if emails were allocated optimally throughout the week vs how many opens you’d get if they were allocated randomly and figure out if optimal allocation would be worthwhile to set up.
Not Bayesian, but can’t you just do ANOVA w/ the non-summarized time of day vs. open rate (using hourly buckets)? That seems like a good first-pass way of telling whether or not there’s an actual difference there. I confess that my stats knowledge is really just from natural sciences experiment-design parts of lab classes, so I have a bias towards frequentist look-up-in-a-table techniques just because they’re what I’ve used.
Rant for a different day, but I think physics/engineering students really get screwed in terms of learning just enough stats/programming to be dangerous. (I.e., you’re just sort of expected to know and use them one day in class, and get told just enough to get by- especially numerical computing and C/Fortran/Matlab).
Suppose you have three hypotheses: (1) It’s better to email in the morning (2) It’s better to email in the evening (3) They’re equally good
Why do you care about (3)? If you’re just deciding whether to email in the morning or evening, (3) is irrelevant to ranking those two options.
The full-fledged Bayesian approach would be to identify the hypotheses (I’ve simplified it by reducing it down to just three), decide what your priors are, calculate the probability of seeing the data under each of the hypotheses, and then combing that data according to the Bayesian formula to find the posterior probability. However, you don’t have to run through the math to see that if your prior for (1) and (2) are equal, and the sample is skewed towards evening, then the posterior for (2) will be larger than the posterior for (1).
The only time you’d actually have to run through the math is if your priors weren’t equal, and you’re trying to decide whether the additional data is enough to overcome the difference in the priors, or if you have some consideration other than just choosing between morning or evening (for instance, you might find it more convenient to just email when you first have something to email about, in which case you’re choosing between “email in morning”, “email in evening” and “email whenever it’s convenient to me”).
“Statistical significance” is just a shorthand to avoid having to actually doing a Bayesian calculation. For instance, suppose we’re trying to decide whether a study showing that a drug is effective is statistically significant. If the only two choices were “take the drug” and “don’t take the drug”, and we were truly indifferent between those two options, the issue of significance wouldn’t even matter. We should just take the drug. The reason we care about whether the test is significant is because we aren’t indifferent to the two choices (we have a bias towards the status quo of not taking the drug, making the drug would cost money, there are probably going to be side effects of the drug, etc.) and there are other options (take another drug, have more drug trials, etc.) When a level of statistical significance is chosen, an implicit statement is being made about how much weight is being given towards the status quo.