Typical analysis of the basic design you described is often something like a mixed 2×2 factorial design: which test (pre- / post-test, within subjects) × intervention (yes/no, between subjects) - the interaction term being evidence for effects of intervention (greater increase between pre- and post- test in intervention condition). Often analysed using ANOVA (participants as random effect), nonparametric equivalents may be more appropriate.
More complex models are also very appropriate, e.g., adding question type as a factor/predictor rather than treating the different questions as separate dependent variables: this would provide indications of whether improvement after intervention differs for the question types, as you’ve predicted. This doesn’t give you clues about bimodality but at least allows you to more directly test your predictions about relative degree of improvement (if the intervention works).
Correlations between your different dependent measures: feel free by all means—but make sure you examine the characteristics of the distributions rather than just zooming ahead with a matrix of correlation coefficients. And be aware of the multiple comparisons problem, Type I error is very likely.
Excluding participants on the basis of overly high performance in pretest is appropriate. If possible I suggest setting this criterion before formal testing (even an educated guess is appropriate as this doesn’t harm the conclusions you can draw: it can be justified as leaving room for improvement if the intervention works) - or at the very least do this before analysing anything else of the participant’s performance to avoid biasing your decision about setting the threshold.
… don’t want to elaborate too much on what tasks we’ll give to the subjects, in case I’ll recruit someone reading this to be one of my test subjects.
I’m afraid you’ve said too much already—and if you’re looking for people who are naive about the principles involved, LW is probably not a great place for recruiting anyway.
please feel free to private message me if you’d like clarification of what I’ve posted—this sort of thing is very much part of my day job.
Could you elaborate on that? Something like “so we’re going to test the impact of traditional instruction versus this prototype educational game on your ability to do these tasks” is what I’d have expected to say to the test subjects anyway, and that’s mostly the content of what I said here. (Though I do admit that the bit about expecting a bimodal distribution depending on whether or not the subjects pay attention to something was a bit of an unnecessary tipoff here.)
In particular, I expect to have a tradeoff—I can tell people even less than that, and get a much smaller group of testers. Or I can tell people that I’ve gotten the game I’ve been working on to a very early prototype stage and am now looking for testers, and advertise that on e.g. LW, and get a much bigger group of test subjects.
and if you’re looking for people who are naive about the principles involved, LW is probably not a great place for recruiting anyway.
It’s true that LW-people are much more likely to be able to e.g. solve the mammography example already, but I’d still expect most users to be relatively unfamiliar with the technicalities of causal networks—I was too, until embarking on this project.
I was thinking more about your previous posts on the subject (your development of the game and some of the ideas behind it). The same general reason I’d avoid testing people from my extended lab network, who may not know any details of a current study but have a sufficiently clear impression of what I’m interested in to potentially influence the outcomes (whether intentionally, “helping me out”, or implicitly).
When rolling it out for testing, you could always include a post-test which probes people’s previous experience (e.g. what they knew in advance about your work & the ideas behind it) & exclude people who report that they know “too much” about the motivations of the study. Could even prompt for some info about LW participation, could also be used to mitigate this issue (especially if you end up with decent samples both in and outside LW).
Typical analysis of the basic design you described is often something like a mixed 2×2 factorial design: which test (pre- / post-test, within subjects) × intervention (yes/no, between subjects) - the interaction term being evidence for effects of intervention (greater increase between pre- and post- test in intervention condition). Often analysed using ANOVA (participants as random effect), nonparametric equivalents may be more appropriate.
More complex models are also very appropriate, e.g., adding question type as a factor/predictor rather than treating the different questions as separate dependent variables: this would provide indications of whether improvement after intervention differs for the question types, as you’ve predicted. This doesn’t give you clues about bimodality but at least allows you to more directly test your predictions about relative degree of improvement (if the intervention works).
Correlations between your different dependent measures: feel free by all means—but make sure you examine the characteristics of the distributions rather than just zooming ahead with a matrix of correlation coefficients. And be aware of the multiple comparisons problem, Type I error is very likely.
Excluding participants on the basis of overly high performance in pretest is appropriate. If possible I suggest setting this criterion before formal testing (even an educated guess is appropriate as this doesn’t harm the conclusions you can draw: it can be justified as leaving room for improvement if the intervention works) - or at the very least do this before analysing anything else of the participant’s performance to avoid biasing your decision about setting the threshold.
I’m afraid you’ve said too much already—and if you’re looking for people who are naive about the principles involved, LW is probably not a great place for recruiting anyway.
please feel free to private message me if you’d like clarification of what I’ve posted—this sort of thing is very much part of my day job.
Thanks a lot!
Could you elaborate on that? Something like “so we’re going to test the impact of traditional instruction versus this prototype educational game on your ability to do these tasks” is what I’d have expected to say to the test subjects anyway, and that’s mostly the content of what I said here. (Though I do admit that the bit about expecting a bimodal distribution depending on whether or not the subjects pay attention to something was a bit of an unnecessary tipoff here.)
In particular, I expect to have a tradeoff—I can tell people even less than that, and get a much smaller group of testers. Or I can tell people that I’ve gotten the game I’ve been working on to a very early prototype stage and am now looking for testers, and advertise that on e.g. LW, and get a much bigger group of test subjects.
It’s true that LW-people are much more likely to be able to e.g. solve the mammography example already, but I’d still expect most users to be relatively unfamiliar with the technicalities of causal networks—I was too, until embarking on this project.
I was thinking more about your previous posts on the subject (your development of the game and some of the ideas behind it). The same general reason I’d avoid testing people from my extended lab network, who may not know any details of a current study but have a sufficiently clear impression of what I’m interested in to potentially influence the outcomes (whether intentionally, “helping me out”, or implicitly).
When rolling it out for testing, you could always include a post-test which probes people’s previous experience (e.g. what they knew in advance about your work & the ideas behind it) & exclude people who report that they know “too much” about the motivations of the study. Could even prompt for some info about LW participation, could also be used to mitigate this issue (especially if you end up with decent samples both in and outside LW).
Ah, that’s a good point. And a good suggestion, too.