Regarding the stopping rule issue, it really depends how you decide the stopping. I believe sequential inference lets you do that without any problem but it’s not the same as saying that the p-value is within the wanted bounds. But basically all of this derives from working with p-values instead of workable values like log-odds. The other problem of p-values is that it only lets you work with binary hypotheses and makes you believe that writing things like P(H0) actually carry a meaning, when in reality you can’t test an hypothesis in a vacuum, you have to test it against an other hypothesis (unless once again it’s binary of course).
An other common mistake you did not talk about is one done in many meta-analyses: one aggregates the data of several studies without checking if the data are logically independent.
I’m not as versed in mistakes of meta-analysis yet, but I’m working on it ! Once I compile enough meta-analysis misuses I will add them to the post. Here is one that’s pretty interesting :
Many studies still use fail-safe N to account for publication bias when it has been shown to be invalid. If you see a study that uses it you can act as if they did not account for publication bias at all.
As someone who wants to do systematic review (meta-analysis with a certain rigidly prescribed structure), I will love to hear about the mistakes to watch out for!
I’m starting to realize that as well. It can give you the intuition without having to memorize theorems. I think I’m going to start using simulations a lot more.
It can give you the intuition without having to memorize theorems.
I find it’s more helpful as a tool to catch wrong intuitions than as a crutch for missing intuition, personally. If you made a mistake with your simulation and you had the wrong intuition (or right intuition), you know something is up (unless the mistake happened to line up with a wrong intuition, at least). If you made a mistake with your simulation and you had no intuition, you’re off in the weeds.
I think I’m going to start using simulations a lot more.
Some general pieces of advice, from someone who does a surprising number of quick simulations for sanity-checking:
Try to introduce small amounts of correlation in everything. In actuality, everything[1] is correlated to some degree. Most of the time this does not matter. Every once in a while, it makes a huge difference.
Try to introduce small amounts of noise into everything. In actuality, everything[2] has noise to some degree. Most of the time this does not matter. Every once in a while, it makes a huge difference.
Beware biased RNGs. Both the obvious and the not so obvious. Most of the time this does not matter. Every once in a while, it makes a huge difference.
Beware floating-point numbers in general. You can write something quickly using floats. You can write something safely using floats. Have fun doing both at once.
Corollary: if you can avoid division (or rewrite to avoid division), use integers instead of floats. Especially if you’re in a language with native bigints.
Rerunning with two different floating-point precisions (e.g. Decimal’s getcontext().prec) can be a decent sanity check, although it’s not a panacea[3].
Beware encoding the same assumptions into your simulation as you did in your intuition.
R… I can’t really say.
Python is decent. Python is also slow.
numpy is faster if you’re doing block operations. If you can (and know how to) restructure your code to take advantage of this, numpy can be quick. If you don’t, numpy can be even slower than standard Python.
PyPy can offer a significant performance boost for Monte-carlo style ‘do this thing a billion times’ code. That being said, PyPy has disadvantages too.
Obvious exception for events that are not within each others lightcones—though even in that case there’s often some event that’s in the causal chain of A or B that has the same correlation.
Regarding the stopping rule issue, it really depends how you decide the stopping. I believe sequential inference lets you do that without any problem but it’s not the same as saying that the p-value is within the wanted bounds. But basically all of this derives from working with p-values instead of workable values like log-odds. The other problem of p-values is that it only lets you work with binary hypotheses and makes you believe that writing things like P(H0) actually carry a meaning, when in reality you can’t test an hypothesis in a vacuum, you have to test it against an other hypothesis (unless once again it’s binary of course).
An other common mistake you did not talk about is one done in many meta-analyses: one aggregates the data of several studies without checking if the data are logically independent.
I’m not as versed in mistakes of meta-analysis yet, but I’m working on it ! Once I compile enough meta-analysis misuses I will add them to the post. Here is one that’s pretty interesting :
https://crystalprisonzone.blogspot.com/2016/07/the-failure-of-fail-safe-n.html
Many studies still use fail-safe N to account for publication bias when it has been shown to be invalid. If you see a study that uses it you can act as if they did not account for publication bias at all.
As someone who wants to do systematic review (meta-analysis with a certain rigidly prescribed structure), I will love to hear about the mistakes to watch out for!
One surprisingly good sanity check I’ve found is to do up a quick Monte Carlo sim in e.g. Python.
As someone who uses statistics, but who is not a statistician, it’s caught an astounding number of subtle issues.
I’m starting to realize that as well. It can give you the intuition without having to memorize theorems. I think I’m going to start using simulations a lot more.
I find it’s more helpful as a tool to catch wrong intuitions than as a crutch for missing intuition, personally. If you made a mistake with your simulation and you had the wrong intuition (or right intuition), you know something is up (unless the mistake happened to line up with a wrong intuition, at least). If you made a mistake with your simulation and you had no intuition, you’re off in the weeds.
Some general pieces of advice, from someone who does a surprising number of quick simulations for sanity-checking:
Try to introduce small amounts of correlation in everything. In actuality, everything[1] is correlated to some degree. Most of the time this does not matter. Every once in a while, it makes a huge difference.
Try to introduce small amounts of noise into everything. In actuality, everything[2] has noise to some degree. Most of the time this does not matter. Every once in a while, it makes a huge difference.
Beware biased RNGs. Both the obvious and the not so obvious. Most of the time this does not matter. Every once in a while, it makes a huge difference.
Beware floating-point numbers in general. You can write something quickly using floats. You can write something safely using floats. Have fun doing both at once.
Corollary: if you can avoid division (or rewrite to avoid division), use integers instead of floats. Especially if you’re in a language with native bigints.
Rerunning with two different floating-point precisions (e.g. Decimal’s getcontext().prec) can be a decent sanity check, although it’s not a panacea[3].
Beware encoding the same assumptions into your simulation as you did in your intuition.
R… I can’t really say.
Python is decent. Python is also slow.
numpy is faster if you’re doing block operations. If you can (and know how to) restructure your code to take advantage of this, numpy can be quick. If you don’t, numpy can be even slower than standard Python.
PyPy can offer a significant performance boost for Monte-carlo style ‘do this thing a billion times’ code. That being said, PyPy has disadvantages too.
Or close enough. P(neutrino hitting A | B) != P(neutrino hitting A | ~B) for pretty much any[4] A and B.
Thermal noise being the most obvious example.
See e.g. the Tent map at m=2, where any finite binary precision will eventually fold to zero.
Obvious exception for events that are not within each others lightcones—though even in that case there’s often some event that’s in the causal chain of A or B that has the same correlation.