Every judge being close to 50% would be bizarre. If I flip 13 coins 53 times I would expect that many of those sets of 13 will stray from the 6.5/13 expected ratio. The big question is whether anyone scored high enough or low enough that we can say “this wasn’t just pure chance”.
Yes, I agree, I meant the (unobserved) probability that each judge gets a given question correct (which will of course differ from the observed fraction of the time each judge is correct. But it appears that at least one judge may have done quite well (as gjm points out). I don’t think that the analysis done so far provides much evidence about how many judges are doing better than chance. It’s possible that there just isn’t enough data to make such an inference, but one possible thing you could do is to plot the p-values in ascending order and see how close they come to a straight line.
Every judge being close to 50% would be bizarre. If I flip 13 coins 53 times I would expect that many of those sets of 13 will stray from the 6.5/13 expected ratio. The big question is whether anyone scored high enough or low enough that we can say “this wasn’t just pure chance”.
Yes, I agree, I meant the (unobserved) probability that each judge gets a given question correct (which will of course differ from the observed fraction of the time each judge is correct. But it appears that at least one judge may have done quite well (as gjm points out). I don’t think that the analysis done so far provides much evidence about how many judges are doing better than chance. It’s possible that there just isn’t enough data to make such an inference, but one possible thing you could do is to plot the p-values in ascending order and see how close they come to a straight line.