Simply run experiments and accept every result as true if the probability of it occurring by random chance falls below some threshold we agree on. This will lead us terriblyastray every once in a while if we are not careful, but it also enables us to run experiments whose conclusions both of us can trust.[2]
To minimize the chance of statistical noise or incorrect inference polluting our conclusions, we create experiments with randomly chosen intervention and control groups, so we are sure the intervention is causally connected to the outcome.
As long as we follow these procedures exactly, we can both trust the conclusion. Others can even join in on the fun too.
Together we arrive at a set of ‘randomista’ interventions we both recognize as valuable. Even if we each have differing priors leading us to opposing preferred interventions, pooling our money together on the randomista interventions beats donating to causes which cancel each other out.
The world is some the better.
Problems I see here:
In an RCT you can get a causal effect measure rather than just a correlation, but you can’t prove whether it generalizes. Example: If you pick some people and flip a coin for each to decide whether to give them a blerg or a sblarg, and then find out that blergs has a positive effect on self-confidence, the weak spot in your inference is the way you have picked the initial set of people within whom you did the randomization. The debate moves from “you can’t prove it’s a causal effect!” to “your set of people where younger than normal”, “the set of people is from a first-world country”, “people changed too much from when you did the experiment to right now”, etc.
Considering only significance and not power (in other words, only checking the probability something would happen “by chance”, rather than the probability it would happen under specific meaningful alternative hypotheses) is limiting and can be misleading enough to be a problem in practice. Read Andrew Gelman for this stuff.
By Wald’s complete class theorems, whatever you do with kosher frequentist statistics that you think is good, is equivalent to some specific prior used by a Bayesian guy doing Bayesian things, so if you at any point think “but frequentist” you should also have a caveat like “considering that humans won’t actually use statistics kosher, I can see a pretty precise story for how doing nonsense in this specific way will produce the correct behavior in the end”, which incidentally is reasoning I would not trust people in general to pull off.
Problems I see here:
In an RCT you can get a causal effect measure rather than just a correlation, but you can’t prove whether it generalizes. Example: If you pick some people and flip a coin for each to decide whether to give them a blerg or a sblarg, and then find out that blergs has a positive effect on self-confidence, the weak spot in your inference is the way you have picked the initial set of people within whom you did the randomization. The debate moves from “you can’t prove it’s a causal effect!” to “your set of people where younger than normal”, “the set of people is from a first-world country”, “people changed too much from when you did the experiment to right now”, etc.
Considering only significance and not power (in other words, only checking the probability something would happen “by chance”, rather than the probability it would happen under specific meaningful alternative hypotheses) is limiting and can be misleading enough to be a problem in practice. Read Andrew Gelman for this stuff.
By Wald’s complete class theorems, whatever you do with kosher frequentist statistics that you think is good, is equivalent to some specific prior used by a Bayesian guy doing Bayesian things, so if you at any point think “but frequentist” you should also have a caveat like “considering that humans won’t actually use statistics kosher, I can see a pretty precise story for how doing nonsense in this specific way will produce the correct behavior in the end”, which incidentally is reasoning I would not trust people in general to pull off.