Arguably the “natural” way to handle the possibility that you (the examiner) are in error is to score answers by (negative) KL-divergence from your own probability assignment. So if there are four options to which you assign probabilities p,q,r,s and a candidate says a,b,c,d then they get p log(a/p) + q log(b/q) + r log(c/r) + s log(d/s). If p=1 and q,r,s=0,0,0 then this is the same as giving them log a, i.e., the usual log-scoring rule. If p=1-3h and q,r,s=h,h,h then this is (1-3h) log (a/(1-3h)) + h log(b/h) + …, which if we fix a is constant + h (log b + log c + log d) = constant + h log bcd, which by the AM-GM inequality is biggest when b=c=d.
This differs from the “expected log score” I described above only by an additive constant. One way to describe it is: the average amount of information the candidate would gain by adopting your probabilities instead of theirs, the average being taken according to your probabilities.
Arguably the “natural” way to handle the possibility that you (the examiner) are in error is to score answers by (negative) KL-divergence from your own probability assignment. So if there are four options to which you assign probabilities p,q,r,s and a candidate says a,b,c,d then they get p log(a/p) + q log(b/q) + r log(c/r) + s log(d/s). If p=1 and q,r,s=0,0,0 then this is the same as giving them log a, i.e., the usual log-scoring rule. If p=1-3h and q,r,s=h,h,h then this is (1-3h) log (a/(1-3h)) + h log(b/h) + …, which if we fix a is constant + h (log b + log c + log d) = constant + h log bcd, which by the AM-GM inequality is biggest when b=c=d.
This differs from the “expected log score” I described above only by an additive constant. One way to describe it is: the average amount of information the candidate would gain by adopting your probabilities instead of theirs, the average being taken according to your probabilities.