timtyler comments on Participation in the LW Community Associated with Less Bias

timtyler 9 Dec 2012 16:57 UTC
−3 points
The survey wasn’t timed—so maybe those more “into” the site put more time and effort into answering the questions. So: I don’t think much in the way of conclusions about bias can be drawn.
- gwern 9 Dec 2012 23:04 UTC
  10 points
  Parent
  Looking, like before, at number of missing answers (which seems like an awful good proxy for how much time one puts into the survey), the people who were right on the first question answer 1 more question but that small difference doesn’t reach statistical significance:
```
R> lw <- read.csv("2012.csv")
R> lw$MissingAnswers <- apply(lw, 1, function(x) sum(sapply(x, function(y) is.na(y) || as.character(y)==" ")))
R> right <- lw[as.character(lw$CFARQuestion1) == "Yes",]$MissingAnswers
R> wrong <- lw[as.character(lw$CFARQuestion1) == "no" | as.character(lw$CFARQuestion1) == "Cannot be determined",]$MissingAnswers
R> t.test(right, wrong)
        Welch Two Sample t-test

    data:  right and wrong
    t = -1.542, df = 942.5, p-value = 0.1234
    alternative hypothesis: true difference in means is not equal to 0
    95 percent confidence interval:
     -2.3817  0.2858
    sample estimates:
    mean of x mean of y
        16.69     17.74
```
  (I’m not going to look at the other questions unless someone really wants me to, since the first question is the one that would benefit the most from some extended thought, unlike the others.)
  What links here?
  - gwern's comment on 2012 Survey Results by Scott Alexander (10 Dec 2012 1:13 UTC; 0 points)
  - gwern 10 Dec 2012 1:09 UTC
    3 points
    Parent
    Out of curiosity, I looked at what a more appropriate logistic regression would say (using this guide); given the categorical variable of the question answer, can one predict how many survey entries were missing/omitted (as a proxy for time investment)? The numbers and method are a little different from a t-test, and the result is a little less statistically significant, but as before there’s no real relationship*:
    
    R> lw <- read.csv("2012.csv") R> lw$MissingAnswers <- apply(lw, 1, function(x) sum(sapply(x, function(y) is.na(y) || as.character(y)==" "))) R> lw <- lw[as.character(lw$CFARQuestion1) != " " & !is.na(as.character(lw$CFARQuestion1)),] R> lw <- data.frame(lw$CFARQuestion1, lw$MissingAnswers) R> summary(glm(lw.CFARQuestion1 ~ lw.MissingAnswers, data = lw, family = "binomial")) Deviance Residuals: Min 1Q Median 3Q Max -1.17 -1.12 -1.05 1.23 1.41 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.00111 0.12214 0.01 0.99 lw.MissingAnswers -0.00900 0.00607 -1.48 0.14 (Dispersion parameter for binomial family taken to be 1) Null deviance: 1366.6 on 989 degrees of freedom Residual deviance: 1364.4 on 988 degrees of freedom AIC: 1368 Number of Fisher Scoring iterations: 3
    * a note to other analyzers: it’s really important to remove null answers/NAs because they’ll show relationships all over the place. In this example, if you leave NAs in for the CFARQuestion1 field, you’ll wind up getting a very statistically significant relationship—because every CFARQuestion left NA by definition increases MissingAnswers by 1! And people who didn’t answer that question probably didn’t answer a lot of other questions, so the NA respondents enable a very easy reliable prediction of MissingAnswers…
  - ChristianKl 10 Dec 2012 20:18 UTC
    1 point
    Parent
    How do you get this nice box for the code? What’s the magic command that you have to tell the Wiki?
    - gwern 10 Dec 2012 20:50 UTC
      5 points
      Parent
      Markdown code syntax is indent each line by >=4 spaces; LW’s implementation is subtly broken since it’s stripping all the internal indentation, and another gotcha is that you can’t have any trailing whitespace or lines will be combined in a way you probably don’t want.
      
      MediaWiki syntax is entirely different and partially depends on what extensions are enabled.
- gwern 9 Dec 2012 18:33 UTC
  9 points
  Parent
  That seems like a rather post hoc explanation.