I intended to bring it up as plausible, but not explicitly say that I thought it was p>0.5 (because it wasn’t a firm belief and I didn’t want others to do any bayesian update). I wanted to read arguments about its plausibility. (Some pretty convincing arguments are SBF’s high level of luxury consumption and that he took away potentially all Alameda shares from the EA cofounder of Alameda, Tara Mac Aulay).
If it is plausible, even if it isn’t p>0.5, then it’s possible SBF wasn’t selfish, in which case that’s a reason for EA to focus more on inculcating philosophy in its members (whether the answer is “naive utilitarianism is wrong, use rule utilitarianism/virtue ethics/deontology” or “naive utilitarianism almost never advocates fraud”, etcetera) (some old and new preventive measures like EA forum posts do exist, maybe that’s enough or maybe not).
RE “Should we then draw different conclusions from their experiments?”
I think, depending on the study’s hypothesis and random situational factors, a study like the first can be in the garden of forking paths. A study which stops at n=100 when it reaches a predefined statistical threshold isn’t guaranteed to have also reached that statistical threshold if it had kept running until n=900.
Suppose a community of researchers is split in half (this is intended to match the example in this article but increase the imagined sample size of studies to more than 1 study and its replication). The first half (non-replicators) of researchers conducts research as mentioned first in the article: they predefine a statistical threshold and stop the study when that threshold is reached. Additionally, if the threshold is not reached when n=1000, then the negative result of the study is published. The second half (replicators) of researchers only do replications of the first half’s work, with the same sample size.
After the studies are done, in some cases, a non-replicator study will find a result and replicator study will find the opposite. In such cases, who is more likely to be correct? I think this article implies that the answer would be 50%, because it is supposedly the same study repeated twice. I do not think that is correct, I think the replicators are correct more than 50% of the time, because the non-replicator studies can be in the garden of forking paths.
The first section of this article explains the general idea of how early stopping can lead to bias: https://www.statisticsdonewrong.com/regression.html
My attempt to be more specific and somewhat mathematical:
For all studies, if we calculate the statistical result of a study for every n, then as n increases, the statistical result may or may not converge monotonically (for almost any study it will converge to something, since reality has regularities, but not necessarily monotonically). Rather, randomness can affect results and the statistical result can bounce a bit up and down between different n. In the case of the non-replicator researchers, for every n, the study can either continue or end, and those 2 options are 2 different paths in the garden of forking paths. If a study’s results do not converge monotonically as n increases, then there may be outliers for certain specific n, where for a specific n (or a minority of n), the result of the study does pass the statistical threshold, but for all other n, it would not pass the statistical threshold.
‘Nonmonotonic convergence’ describes which studies are affected by the early stopping bias.
Post-script: If a study converges monotonically, then there is no problem with early stopping. However, even if your study had been converging monotonically for every previous n, that isn’t an absolute guarantee that it would have continued converging monotonically as n increases. However, the larger the sample size, the larger your confidence.