summerstay comments on Against NHST

summerstay 21 Dec 2012 16:17 UTC
2 points
Perhaps you would suggest showing the histograms of completion times on each site, along with the 95% confidence error bars?
- jsteinhardt 21 Dec 2012 17:06 UTC
  2 points
  Parent
  Presumably not actually 95%, but, as gwern said, a threshold based on the cost of false positives.
  - gwern 21 Dec 2012 17:34 UTC
    6 points
    Parent
    Yes, in this case you could keep using p-values (if you really wanted to...), but with reference to the value of, say, each customer. (This is what I meant by setting the threshold with respect to decision theory.) If the goal is to use on a site making millions of dollars*, 0.01 may be too loose a threshold, but if he’s just messing with his personal site to help readers, a p-value like 0.10 may be perfectly acceptable.
    
    * If the results were that important, I think there’d be better approaches than a once-off a/b test. Adaptive multi-armed bandit algorithms sound really cool from what I’ve read of them.
- gwern 21 Dec 2012 16:56 UTC
  2 points
  Parent
  I’d suggest more of a scattergram than a histogram; superimposing 95% CIs would then cover the exploratory data/visualization & confidence intervals. Combine that with an effect size and one has made a good start.