If there was a consensus among the 8 as to which tuning is better, that would be significant, right? Since the chance of that is 1⁄128 if they can’t tell the difference. You can even get p < 0.05 with one dissenter if you use a one-tailed test (which is maybe dubious). Of course we don’t know what the data look like, so I’m just being pedantic here.
To reach statistical significance, they must have tested each of the 8 pianists more than once.
If there was a consensus among the 8 as to which tuning is better, that would be significant, right? Since the chance of that is 1⁄128 if they can’t tell the difference. You can even get p < 0.05 with one dissenter if you use a one-tailed test (which is maybe dubious). Of course we don’t know what the data look like, so I’m just being pedantic here.