A1987dM comments on Rationality Quotes February 2013

A1987dM 2 Feb 2013 17:01 UTC
9 points
0
(Except when it’s a novel and the text on the back cover spoilers events from the middle of the book or later which I would have preferred to not read until the right time.)
- aleksiL 3 Feb 2013 14:23 UTC
  6 points
  0
  Parent
  Spoilers matter less than you think.
  - Kaj_Sotala 3 Feb 2013 22:16 UTC
    28 points
    0
    Parent
    According to a single counter-intuitive (and therefore more likely to make headlines), unreplicated study.
    - BerryPick6 3 Feb 2013 22:17 UTC
      14 points
      0
      Parent
      Gah! Spoiler!
  - A1987dM 3 Feb 2013 23:47 UTC
    5 points
    0
    Parent
    Those error bars look large enough that I could still be right about myself even without being a total freak.
    - satt 4 Feb 2013 1:11 UTC
      2 points
      0
      Parent
      Really? 11 of the 12 stories got rated higher when spoiled, which is decent evidence against the nil hypothesis (spoilers have zero effect on hedonic ratings) regardless of the error bars’ size. Under the nil hypothesis, each story has a ⁵⁰⁄₅₀ chance of being rated higher when spoiled, giving a probability of (¹²C₁₁ × 0.5¹¹ × 0.5¹) + (¹²C₁₂ × 0.5¹² × 0.5⁰) = 0.0032 that ≥11 stories get a higher rating when spoiled. So the nil hypothesis gets rejected with a p-value of 0.0063 (the probability’s doubled to make the test two-tailed), and presumably the results are still stronger evidence against a spoilers-are-bad hypothesis.
      
      This, of course, doesn’t account for unseen confounders, inter-individual variation in hedonic spoiler effects, publication bias, or the sample (79% female and taken from “the psychology subject pool at the University of California, San Diego”) being unrepresentative of people in general. So you’re still not necessarily a total freak!
      - A1987dM 4 Feb 2013 20:10 UTC
        5 points
        0
        Parent
        Yeah, it doesn’t seem likely given that study that works are liked in average less when spoiled; but what I meant is that probably there are certain individuals who like works less when spoiled. (Imagine Alice said something to the effect that she prefers chocolate ice cream to vanilla ice cream, and Bob said that it’s not actually the case that vanilla tastes worse than chocolate, citing a study in which for 11 out of 12 ice cream brands their vanilla ice cream is liked more in average than their chocolate ice cream—though in most cases the difference between the averages is not much bigger than each standard deviation; even if the study was conducted among a demographic that does include Alice, that still wouldn’t necessarily mean Alice is mistaken, lying, or particularly unusual, would it?)
        satt 4 Feb 2013 21:29 UTC
        2 points
        0
        Parent
        Just so. These are the sort of “inter-individual variation in hedonic spoiler effects” I had in mind earlier.
        
        Edit: to elaborate a bit, it was the “error bars look large enough” bit of your earlier comment that triggered my sceptical “Really?” reaction. Apart from that bit I agree(d) with you!
        
        Edit 2: aha, I probably did misunderstand you earlier. I originally interpreted your error bars comment as a comment on the statistical significance of the pairwise differences in bar length, but I guess you were actually ballparking the population standard deviation of spoiler effect from the sample size and the standard errors of the means.
        A1987dM 5 Feb 2013 4:56 UTC
        2 points
        0
        Parent
        
        These are the sort of “inter-individual variation in hedonic spoiler effects” I had in mind earlier.
        
        Huh. For some reason I had read that as “intra-individual”. Whatever happened to the “assume people are saying something reasonable” module in my brain?
        
        I guess you were actually ballparking the population standard deviation of spoiler effect from the sample size and the standard errors of the means.
        
        Yep.
      - Kindly 4 Feb 2013 14:56 UTC
        0 points
        0
        Parent
        You can’t just ignore the error bars like that. In 8 of the 12 cases, the error bars overlap, which means there’s a decent chance that those comparisons could have gone either way, even assuming the sample mean is exactly correct. A spoilers-are-good hypothesis still has to bear the weight of this element of chance.
        
        As a rough estimate: I’d say we can be sure that 4 stories are definitely better spoilered (>2 sd’s apart); out of the ones 1..2 sd’s apart, maybe 3 are actually better spoilered; and out of the remainder, they could’ve gone either way. So we have maybe 9 out of 12 stories that are better with spoilers, which gives a probability of 14.5% if we do the same two-tailed test on the same null hypothesis.
        
        I don’t necessarily want you to trust the numbers above, because I basically eyeballed everything; however, it gives an idea of why error bars matter.
        satt 4 Feb 2013 22:56 UTC
        2 points
        0
        Parent
        
        You can’t just ignore the error bars like that.
        
        Ignoring the error bars does throw away potentially useful information, and this does break the rules of Bayes Club. But this makes the test a conservative one (Wikipedia: “it has very general applicability but may lack the statistical power of other tests”), which just makes the rejection of the nil hypothesis all the more convincing.
        
        In 8 of the 12 cases, the error bars overlap, which means there’s a decent chance that those comparisons could have gone either way, even assuming the sample mean is exactly correct. A spoilers-are-good hypothesis still has to bear the weight of this element of chance.
        
        If I’m interpreting this correctly, “the error bars overlap” means that the heights of two adjacent bars are within ≈2 standard errors of each other. In that case, overlapping error bars doesn’t necessarily indicate a decent chance that the comparisons could go either way; a 2 std. error difference is quite a big one.
        
        As a rough estimate: I’d say we can be sure that 4 stories are definitely better spoilered (>2 sd’s apart); out of the ones 1..2 sd’s apart, maybe 3 are actually better spoilered; and out of the remainder, they could’ve gone either way. So we have maybe 9 out of 12 stories that are better with spoilers, which gives a probability of 14.5% if we do the same two-tailed test on the same null hypothesis.
        
        But this is an invalid application of the test. The sign test already allows for the possibility that each pairwise comparison can have the wrong sign. Making your own adjustments to the numbers before feeding them into the test is an overcorrection. (Indeed, if “we can be sure that 4 stories are definitely better spoilered”, there’s no need to statistically test the nil hypothesis because we already have definite evidence that it is false!)
        
        I don’t necessarily want you to trust the numbers above, because I basically eyeballed everything; however, it gives an idea of why error bars matter.
        
        This reminds me of a nice advantage of the sign test. One needn’t worry about squinting at error bars; it suffices to be able to see which of each pair of solid bars is longer!
        Kindly 4 Feb 2013 23:04 UTC
        3 points
        0
        Parent
        
        Indeed, if “we can be sure that 4 stories are definitely better spoilered”, there’s no need to statistically test the nil hypothesis because we already have definite evidence that it is false!
        
        Okay, if all you’re testing is that “there exist stories for which spoilers make reading more fun” then yes, you’re done at that point. As far as I’m concerned, it’s obvious that such stories exist for either direction; the conclusion “spoilers are good” or “spoilers are bad” follows if one type of story dominates.
  - roystgnr 5 Feb 2013 22:43 UTC
    4 points
    0
    Parent
    I don’t like the study setup there. One readthrough of spoiled vs one readthrough of unspoiled material lets you compare the participants’ hedonic ratings of dramatic irony vs mystery, and it’s quite reasonable that the former would be equally or more enjoyable… but unlike in the study, in real life unspoiled material can be read twice: the first time for the mystery, then the second time for the dramatic irony; with spoiled material you only get the latter.