ChristianKl comments on Doing Important Research on Amazon’s Mechanical Turk?

ChristianKl 27 Sep 2013 11:02 UTC
1 point

Moreover, this is something that can be controlled for.

How would you practically go about controlling for it?
- Peter Wildeford 29 Sep 2013 3:24 UTC
  2 points
  Parent
  A fourth way: include a reading passage and then, on a separate page, a question to test to see if they read the passage.
- Peter Wildeford 27 Sep 2013 19:30 UTC
  2 points
  Parent
  Another thing you can do is put a timer in the survey that keeps track of how much time they spend on each question.
- Peter Wildeford 27 Sep 2013 12:41 UTC
  1 point
  Parent
  Here’s one example:
  
  Q12: Thinking about the candidate that you read about, how relevant do you think the following considerations are to their judgment of right and wrong? (Pick a number on the 1-7 scale.)
  
  (a) Whether or not someone suffered emotionally. Not At All Relevant 1 2 3 4 5 6 7 Extremely Relevant
  
  (b) Whether or not someone acted unfairly. Not At All Relevant 1 2 3 4 5 6 7 Extremely Relevant
  
  (c) Whether or not someone’s action showed love for his or her country. Not At All Relevant 1 2 3 4 5 6 7 Extremely Relevant
  
  (d) Whether or not someone did something disgusting. Not At All Relevant 1 2 3 4 5 6 7 Extremely Relevant
  
  (e) Whether or not someone enjoyed apple juice. Not At All Relevant 1 2 3 4 5 6 7 Extremely Relevant
  
  (f) Whether or not someone showed a lack of respect for authority. Not At All Relevant 1 2 3 4 5 6 7 Extremely Relevant
  - Lumifer 27 Sep 2013 16:26 UTC
    0 points
    Parent
    Looking at this specific example and imagining myself doing this for $1.50/hour or so (with the implication that my IQ isn’t anywhere close to three digits) -- I can’t possibly give true answers because the question is far too complicated and I can’t afford to spend ten minutes to figure it out. Even if I honestly want to not “cheat”.
    - Peter Wildeford 27 Sep 2013 17:54 UTC
      2 points
      Parent
      Well, there are two reasons why that would be the case:
      
      1.) This question refers to a specific story that you would have read previously in the study.
      
      2.) The formatting here is jumbled text. The format of the actual survey includes radio buttons and is much nicer.
      - Lumifer 27 Sep 2013 18:57 UTC
        2 points
        Parent
        Ah, no, let me clarify. It requires intellectual effort to untangle Q12 and understand what actually does it ask you. This is a function of the way it is formulated and has nothing to do with knowing the context or the lack of radio buttons.
        
        It is easy for high-IQ people to untangle such questions in their heads so they don’t pay much attention to this—it’s “easy”. It is hard for low-IQ people to do this, so unless there is incentive for them to actually take the time, spend the effort, and understand the question they are not going to do it.
        Peter Wildeford 27 Sep 2013 19:33 UTC
        2 points
        Parent
        It’s definitely a good idea to keep the questions simple and I’d plan on paying attention to that. But this question actually was used in an MTurk sample and it went ok.
        
        Regardless, even if the question itself is bad, the general point is that this is one way you can control for whether people are clicking randomly. Another way is to have an item and it’s inverse (“I consider myself an optimistic person” and later “I consider myself a pessimistic person”) and a third way is to run a timer in the questionnaire.
        Lumifer 27 Sep 2013 19:55 UTC
        0 points
        Parent
        
        and it went ok
        
        What does “went ok” mean and how do you know it?
        
        this is one way you can control for whether people are clicking randomly
        
        Let’s be more precise: this is one way you can estimate whether people (or scripts) are clicking randomly. This estimate should come with its own uncertainty (=error bars, more or less) which should be folded into the overall uncertainty of survey results.
        Peter Wildeford 27 Sep 2013 23:47 UTC
        2 points
        Parent
        
        What does “went ok” mean and how do you know it?
        
        Well, the results were consistent with the hypothesis, the distribution of responses didn’t look random, not too many people failed the “apple juice” question, and the timer data looked reasonable.
        
        ~
        
        this is one way you can estimate whether people (or scripts) are clicking randomly.
        
        That’s generally what I meant by “control”. But at that point, we might just be nitpicking about words.
        Lumifer 30 Sep 2013 16:00 UTC
        0 points
        Parent
        
        we might just be nitpicking about words
        
        Possibly, though I have in mind a difference in meaning or, perhaps, attitude. “Can control” implies to me that you think you can reduce this issue to irrelevance, it will not affect the results. “Will estimate” implies that this is another source of uncertainty, you’ll try to get a handle on it but still it will add to the total uncertainty of the final outcome.
        Eugine_Nier 28 Sep 2013 14:13 UTC
        −2 points
        Parent
        
        Well, the results were consistent with the hypothesis, the distribution of responses didn’t look random, not too many people failed the “apple juice” question, and the timer data looked reasonable.
        
        Well, the most obvious misinterpretations of the question will also result in people not failing the “apple juice” question.
  - ChristianKl 27 Sep 2013 13:13 UTC
    0 points
    Parent
    What cut of criteria would you use with those questions to avoid cherry picking of data?
    - Peter Wildeford 27 Sep 2013 17:56 UTC
      2 points
      Parent
      You check to make sure that “Whether or not someone enjoyed apple juice” is put at 1 or 2 or you throw out the participant. Otherwise, you keep the response.
      
      There are a few other tactics. Another one is to have a question like “I consider myself optimistic” and then later have a question “I consider myself pessimistic” and you check to see if the answers are in an inverse relationship.
      - Lumifer 27 Sep 2013 20:08 UTC
        0 points
        Parent
        
        Another one is to have a question like “I consider myself optimistic” and then later have a question “I consider myself pessimistic” and you check to see if the answers are in an inverse relationship.
        
        And if they are, you mark the person as bipolar :-D