I’m not sure if these are good reasons, but it seems to me that
1) The expected answer to the quiz does not just consist in identifying A as a correct answer but also in identifying the others as incorrect answers. I mean that the expected right answer is 100:0:0:0 (and not, for example, 100:50:0:0 or whatever else).
2) Giving 25:25 for B:C is better than giving 50:0 even if answer C is 0 since 25:25 is closer to 0:0 than 50:0 (for the usual Euclidean distance). In this perspective, a better answer for the 50:50:0:0′s guy would have been 50:25:0:0, which is better than 50:25:25:0.
3) With this perspective, I am indeed not sure that encouraging for a student’s answer the sum to be 100 is a good idea. It seems better (for the student which is answering) to focus on each proposition (i.e., A, B, C or D) separately (related to point 1 of my message). For each proposition, the answer should reflect the credence of the person in the the fact that the answer is correct/incorrect. Therefore this could also be applied for a multiple-choice quiz with zero or more than one good answer(s).
EDIT (added) :
To sum up what I think could in this case be an answer to your question, I will say that, with the “quadratic scoring rule”, if the expected answer for A:B:C:D is 100:0:0:0, then the answer 1) 50:25:0:0 scores more than the answer 2) 50:50:0:0 because they are both right for C and D, they are at the same distance of the expected answer for A but 1) is closer to the expected answer for B (which is 0) than 2).
The same reasoning works for comparing 1′) 50:25:25:0 with 2′) 50:50:0:0, except that in this second case, it is the general distance (for the quadratic scoring rule) of 25:25 (for B:C) which is closer to 0:0 than 50:0.
Maybe 1) is where I have a fundamental difference.
Given evidence A, a Bayesian update considers how well evidence A was predicted.
There is no additional update due to how well ¬A being false was predicted. Even if ¬A is split into sub-categories, it isn’t relevant as that evidence has already been taken into account when we updated based on A being true.
r.e. 2) 50:25:0:0 gives a worse expected value than 50:50:0:0 as although my score increases if A is true, it decreases by more if B is true (assuming 50:50:0:0 is my true belief)
r.e. 3) I think it’s important to note that I’m assuming that exactly 1 of A or B or C or D is the correct answer. Therefore that the probabilities should add up to 100% to maximise your expected score (otherwise it isn’t a proper scoring rule).
[… why do they score more?]
I’m not sure if these are good reasons, but it seems to me that
1) The expected answer to the quiz does not just consist in identifying A as a correct answer but also in identifying the others as incorrect answers. I mean that the expected right answer is 100:0:0:0 (and not, for example, 100:50:0:0 or whatever else).
2) Giving 25:25 for B:C is better than giving 50:0 even if answer C is 0 since 25:25 is closer to 0:0 than 50:0 (for the usual Euclidean distance). In this perspective, a better answer for the 50:50:0:0′s guy would have been 50:25:0:0, which is better than 50:25:25:0.
Indeed, 1 - [(1-1/2)^2 + (1/4)^2 + 0^2 + 0^2] > 1 - [(1-1/2)^2 + (1/4)^2 + (1/4)^2 + 0^2] > 1 - [(1-1/2)^2 + (1/2)^2 + 0^2 + 0^2].
3) With this perspective, I am indeed not sure that encouraging for a student’s answer the sum to be 100 is a good idea. It seems better (for the student which is answering) to focus on each proposition (i.e., A, B, C or D) separately (related to point 1 of my message). For each proposition, the answer should reflect the credence of the person in the the fact that the answer is correct/incorrect. Therefore this could also be applied for a multiple-choice quiz with zero or more than one good answer(s).
EDIT (added) :
To sum up what I think could in this case be an answer to your question, I will say that, with the “quadratic scoring rule”, if the expected answer for A:B:C:D is 100:0:0:0, then the answer 1) 50:25:0:0 scores more than the answer 2) 50:50:0:0 because they are both right for C and D, they are at the same distance of the expected answer for A but 1) is closer to the expected answer for B (which is 0) than 2).
The same reasoning works for comparing 1′) 50:25:25:0 with 2′) 50:50:0:0, except that in this second case, it is the general distance (for the quadratic scoring rule) of 25:25 (for B:C) which is closer to 0:0 than 50:0.
Maybe 1) is where I have a fundamental difference.
Given evidence A, a Bayesian update considers how well evidence A was predicted.
There is no additional update due to how well ¬A being false was predicted. Even if ¬A is split into sub-categories, it isn’t relevant as that evidence has already been taken into account when we updated based on A being true.
r.e. 2) 50:25:0:0 gives a worse expected value than 50:50:0:0 as although my score increases if A is true, it decreases by more if B is true (assuming 50:50:0:0 is my true belief)
r.e. 3) I think it’s important to note that I’m assuming that exactly 1 of A or B or C or D is the correct answer. Therefore that the probabilities should add up to 100% to maximise your expected score (otherwise it isn’t a proper scoring rule).