Quadratic scoring rules are often referred to as the Brier score (it seems odd to refer to one score by a name and the other by its functional form, rather than comparing names or functions).
You can read a comparison of the three proper scoring rules by Eric Bickel here. He argues for logarithmic scoring rules because of two practical concerns that I suspect are different from Eliezer’s concern.
So, it looks like the two main concerns int his paper are:
Brier Score is non-local, meaning that sometimes it benefits giving a slightly lower probability to a true statement. This is because it penalizes slightly for not distributing your probability mass equally among all false hypothesis. This seems like it is probably a bad thing, but I am not completely sure. It is still a waste of information to prefer B to C when the correct answer is A. Additionally, if we only think about this in the context of true-false questions, this is completely a non-concern.
Bayesian Score is more stable to slightly non-linear utility functions. This is argued as a pro for Bayesian Score, but I think it should be the other way. Bayesian Score is more stable to non-linear utility functions, but with Brier Score, you can use randomness to remove any problems from non-linear utility functions completely. Because Brier Score gives you scores between 0 and 1, you don’t have to reward different utilities. You can just say you get some fixed utility with probability equal to your score. This is impossible in Bayesian Score.
The paper also talks about a third “Spherical” scoring mechanism, which sets your score equal to the probability you assigned to the correct answer divided by the square root of the sum of the squares of all the probabilities.
Now that I know the name of this scoring rule, I will look for more information, but I think if anything that paper makes me like the Brier score better.(at least for true-false questions)
Quadratic scoring rules are often referred to as the Brier score (it seems odd to refer to one score by a name and the other by its functional form, rather than comparing names or functions).
You can read a comparison of the three proper scoring rules by Eric Bickel here. He argues for logarithmic scoring rules because of two practical concerns that I suspect are different from Eliezer’s concern.
Thanks!
So, it looks like the two main concerns int his paper are:
Brier Score is non-local, meaning that sometimes it benefits giving a slightly lower probability to a true statement. This is because it penalizes slightly for not distributing your probability mass equally among all false hypothesis. This seems like it is probably a bad thing, but I am not completely sure. It is still a waste of information to prefer B to C when the correct answer is A. Additionally, if we only think about this in the context of true-false questions, this is completely a non-concern.
Bayesian Score is more stable to slightly non-linear utility functions. This is argued as a pro for Bayesian Score, but I think it should be the other way. Bayesian Score is more stable to non-linear utility functions, but with Brier Score, you can use randomness to remove any problems from non-linear utility functions completely. Because Brier Score gives you scores between 0 and 1, you don’t have to reward different utilities. You can just say you get some fixed utility with probability equal to your score. This is impossible in Bayesian Score.
The paper also talks about a third “Spherical” scoring mechanism, which sets your score equal to the probability you assigned to the correct answer divided by the square root of the sum of the squares of all the probabilities.
Now that I know the name of this scoring rule, I will look for more information, but I think if anything that paper makes me like the Brier score better.(at least for true-false questions)
It’s probably worth pointing out that the paper is by J. Eric Bickel and not by the much better known statistician Peter Bickel.
Edited.