I think there should be basically no update from this study, and most others using apps like Mechanical Turk and Google Consumer Surveys, which is what this study used. Consumer surveys are paid ads where you have to fill out the survey before getting to do what you actually want to do. People are incentivized to complete these as soon as possible and there is little penalty for inaccuracy. I would bet that this doesn’t reproduce if correct answers are incentivized, or even if it’s run on volunteers who aren’t being stopped from what they want to do to take the survey.
I partially agree. But I’ll push back a little. The 23% wrong answers are not random key mashing. 80% of them type the number “10”. Instead of copying the single digit that they are instructed to copy, they are looking further from the answer blank, finding two three-digit numbers, subtracting one from the other, and typing the two-digit difference. That is more work than the correct answer, not less.
But still, I do agree that if people had accuracy incentives, the correct answer rate would probably go up.… Then again, maybe, with monetary incentives, people would be even more likely to think the provided answer was a trap. If I had to bet, I’d say that incentives would increase the 77% “5” rate. But I wouldn’t be totally shocked if it went down.
This response is incorrect. Firstly, Google Consumer Surveys is very different from MTurk. MTurk users are paid to pay attention to the task for a given amount of time, and they are not ‘paid ads’.
“People are incentivized to complete these as soon as possible and there is little penalty for inaccuracy.”
This is generally untrue for MTurk—when you run online surveys with MTurk:
1) You set exclusion criteria for people finishing too quickly 2) You set attention check questions that, when you fail, exclude a respondent 3) A respondent is also excluded for certain patterns of answers e.g. answering the same for all questions or sometimes for giving contradictory answers 4) The respondent is rated on their response quality, and they will lose reputation points (and potentially future employment opportunities) for giving an inaccurate response
You can apply these criteria more or less rigorously, but I’d assume that the study designers followed standard practice (see this doc), at least.
I’m not claiming that MTurk is a very good way of getting humans to respond attentively, of course. There are lots of studies looking at error rates: https://arxiv.org/ftp/arxiv/papers/2101/2101.04459.pdf and there are obvious issues with exclusively looking at a population of ‘professional survey respondents’ for any given question.
So I’m not exactly sure how this survey causes me to update. Perhaps especially because they’re slightly rushing the answers, it’s genuinely interesting that so many people choose to respond to the (perceived) moderate complexity task (which they assume is: “calculate 110 − 100″), rather than either the simple task: “write out the number 5”.
There are certainly good ways to ask such a question with reasonable motivation to get it right. You could include this in a 5-question quiz, for instance, and say “you get paid only (or more) if you get 3 correct”. And then vary and permute the questions so nobody uses 20 accounts to do the same task and not separately answer the questions.
But that’s expensive and time-consuming, and unless the paper specifies it, one should assume they did the simpler/cheaper option of just paying people to answer.
Monetary incentives raise solution rates a little, but not that much. Lex Borghans and co-authors manipulate small incentives and they do almost nothing. Ben Enke and co-authors offers a full month’s salary to Nairobi based undergrads and finds a 13% percentage point increase in solution rates.
I’m not sure how our manipulations would interact with monetary incentives. But I’d like to know!
When surveys on mturk are designed to hold a single account occupied for longer than strictly necessary to fill out an answer that passes any surface-level validity checks, the obvious next step is for people to run multiple accounts on multiple devices, and you’re back at people giving low-effort answers as fast as possible.
I guess it depends on what your priors already were but 23% is far higher than the usual ‘lizardman’, so one update might be to greatly expand how much error is associated with any survey. If the numbers are that high, it gets harder to understand many things (unless more rigorous survey methods are used etc)
I think there should be basically no update from this study, and most others using apps like Mechanical Turk and Google Consumer Surveys, which is what this study used. Consumer surveys are paid ads where you have to fill out the survey before getting to do what you actually want to do. People are incentivized to complete these as soon as possible and there is little penalty for inaccuracy. I would bet that this doesn’t reproduce if correct answers are incentivized, or even if it’s run on volunteers who aren’t being stopped from what they want to do to take the survey.
I partially agree. But I’ll push back a little. The 23% wrong answers are not random key mashing. 80% of them type the number “10”. Instead of copying the single digit that they are instructed to copy, they are looking further from the answer blank, finding two three-digit numbers, subtracting one from the other, and typing the two-digit difference. That is more work than the correct answer, not less.
But still, I do agree that if people had accuracy incentives, the correct answer rate would probably go up.… Then again, maybe, with monetary incentives, people would be even more likely to think the provided answer was a trap. If I had to bet, I’d say that incentives would increase the 77% “5” rate. But I wouldn’t be totally shocked if it went down.
This response is incorrect. Firstly, Google Consumer Surveys is very different from MTurk. MTurk users are paid to pay attention to the task for a given amount of time, and they are not ‘paid ads’.
“People are incentivized to complete these as soon as possible and there is little penalty for inaccuracy.”
This is generally untrue for MTurk—when you run online surveys with MTurk:
1) You set exclusion criteria for people finishing too quickly
2) You set attention check questions that, when you fail, exclude a respondent
3) A respondent is also excluded for certain patterns of answers e.g. answering the same for all questions or sometimes for giving contradictory answers
4) The respondent is rated on their response quality, and they will lose reputation points (and potentially future employment opportunities) for giving an inaccurate response
You can apply these criteria more or less rigorously, but I’d assume that the study designers followed standard practice (see this doc), at least.
I’m not claiming that MTurk is a very good way of getting humans to respond attentively, of course. There are lots of studies looking at error rates: https://arxiv.org/ftp/arxiv/papers/2101/2101.04459.pdf and there are obvious issues with exclusively looking at a population of ‘professional survey respondents’ for any given question.
So I’m not exactly sure how this survey causes me to update. Perhaps especially because they’re slightly rushing the answers, it’s genuinely interesting that so many people choose to respond to the (perceived) moderate complexity task (which they assume is: “calculate 110 − 100″), rather than either the simple task: “write out the number 5”.
There are certainly good ways to ask such a question with reasonable motivation to get it right. You could include this in a 5-question quiz, for instance, and say “you get paid only (or more) if you get 3 correct”. And then vary and permute the questions so nobody uses 20 accounts to do the same task and not separately answer the questions.
But that’s expensive and time-consuming, and unless the paper specifies it, one should assume they did the simpler/cheaper option of just paying people to answer.
Monetary incentives raise solution rates a little, but not that much. Lex Borghans and co-authors manipulate small incentives and they do almost nothing. Ben Enke and co-authors offers a full month’s salary to Nairobi based undergrads and finds a 13% percentage point increase in solution rates.
I’m not sure how our manipulations would interact with monetary incentives. But I’d like to know!
When surveys on mturk are designed to hold a single account occupied for longer than strictly necessary to fill out an answer that passes any surface-level validity checks, the obvious next step is for people to run multiple accounts on multiple devices, and you’re back at people giving low-effort answers as fast as possible.
I guess it depends on what your priors already were but 23% is far higher than the usual ‘lizardman’, so one update might be to greatly expand how much error is associated with any survey. If the numbers are that high, it gets harder to understand many things (unless more rigorous survey methods are used etc)