The result looks pretty weak. They had 62 kids. First, they gave all the kids a fluid intelligence test to measure their baseline fluid intelligence. Then half the kids (32) were given a month of n-back training (which the authors expect to increase their fluid intelligence) while the other half (30) did a control training which was not supposed to influence fluid intelligence. At the end of the month’s training all of the kids took another fluid intelligence test to see if they’d improved, and 3 months later they all took a fluid intelligence test once more to see if they’d retained any improvement.
The result that you’d look for with this design, if n-back training improves fluid intelligence, is that the group that did n-back training would show a larger increase in fluid intelligence scores from the baseline test to the test after training. They looked and did not find that result—in fact, it was not even close to significant (F < 1). That’s the effect that the study was designed to find, and it wasn’t there. So that’s not a good sign.
The kids who did n-back training did improve at the n-back task, so the authors decided to look at the data in another way—they divided the 32 kids in that group in half based on how much they had improved on the n-back task, and looked separately at the 16 who improved the most and the 16 who improved the least. The group of 16 high-improvers did improve on the fluid intelligence test, significantly more than the control group, and they retained that improvement on the follow-up test of fluid intelligence. That is the main result that the paper reports, which they interpret as a causal effect of n-back training. The 16 low-improvers did not have a statistically significant difference from the control group on the fluid intelligence test.
But this just isn’t that convincing a result, as the study no longer has an experimental design when you’re using n-back performance to divide up the kids. If you give kids 2 intelligence tests (one the n-back task, one the fluid intelligence test), and a month later you give them both intelligence tests again, then it’s not surprising that the kids who improved the most on one test would tend to also improve the most on the other test. And that’s basically all that they found. Their study design involved training the kids on one of those two tests (n-back) during the month-long gap, but there’s no particular reason to think that this had a causal effect on their improvement on the other test. There are plenty of variables that could affect intelligence test performance which would affect performance on both tests similarly (amount of neural development, being sick, learning disability, etc.).
If there is a causal benefit of n-back, then it should show up in the effect that they were originally looking for (more fluid intelligence improvement in the group that did n-back training than the control group). Perhaps they’d need a larger sample size (200 kids instead of 62?) to find it if the benefit only happens to some of the kids (as they claim), but if some kids benefit from the training while others get no effect from it then the net effect should be a measurable benefit. I’d want to see that result before I’m persuaded.
The result looks pretty weak. They had 62 kids. First, they gave all the kids a fluid intelligence test to measure their baseline fluid intelligence. Then half the kids (32) were given a month of n-back training (which the authors expect to increase their fluid intelligence) while the other half (30) did a control training which was not supposed to influence fluid intelligence. At the end of the month’s training all of the kids took another fluid intelligence test to see if they’d improved, and 3 months later they all took a fluid intelligence test once more to see if they’d retained any improvement.
The result that you’d look for with this design, if n-back training improves fluid intelligence, is that the group that did n-back training would show a larger increase in fluid intelligence scores from the baseline test to the test after training. They looked and did not find that result—in fact, it was not even close to significant (F < 1). That’s the effect that the study was designed to find, and it wasn’t there. So that’s not a good sign.
The kids who did n-back training did improve at the n-back task, so the authors decided to look at the data in another way—they divided the 32 kids in that group in half based on how much they had improved on the n-back task, and looked separately at the 16 who improved the most and the 16 who improved the least. The group of 16 high-improvers did improve on the fluid intelligence test, significantly more than the control group, and they retained that improvement on the follow-up test of fluid intelligence. That is the main result that the paper reports, which they interpret as a causal effect of n-back training. The 16 low-improvers did not have a statistically significant difference from the control group on the fluid intelligence test.
But this just isn’t that convincing a result, as the study no longer has an experimental design when you’re using n-back performance to divide up the kids. If you give kids 2 intelligence tests (one the n-back task, one the fluid intelligence test), and a month later you give them both intelligence tests again, then it’s not surprising that the kids who improved the most on one test would tend to also improve the most on the other test. And that’s basically all that they found. Their study design involved training the kids on one of those two tests (n-back) during the month-long gap, but there’s no particular reason to think that this had a causal effect on their improvement on the other test. There are plenty of variables that could affect intelligence test performance which would affect performance on both tests similarly (amount of neural development, being sick, learning disability, etc.).
If there is a causal benefit of n-back, then it should show up in the effect that they were originally looking for (more fluid intelligence improvement in the group that did n-back training than the control group). Perhaps they’d need a larger sample size (200 kids instead of 62?) to find it if the benefit only happens to some of the kids (as they claim), but if some kids benefit from the training while others get no effect from it then the net effect should be a measurable benefit. I’d want to see that result before I’m persuaded.