Yes, I agree, I meant the (unobserved) probability that each judge gets a given question correct (which will of course differ from the observed fraction of the time each judge is correct. But it appears that at least one judge may have done quite well (as gjm points out). I don’t think that the analysis done so far provides much evidence about how many judges are doing better than chance. It’s possible that there just isn’t enough data to make such an inference, but one possible thing you could do is to plot the p-values in ascending order and see how close they come to a straight line.
Yes, I agree, I meant the (unobserved) probability that each judge gets a given question correct (which will of course differ from the observed fraction of the time each judge is correct. But it appears that at least one judge may have done quite well (as gjm points out). I don’t think that the analysis done so far provides much evidence about how many judges are doing better than chance. It’s possible that there just isn’t enough data to make such an inference, but one possible thing you could do is to plot the p-values in ascending order and see how close they come to a straight line.