It will take me a while to think about this in more detail, but for now I’ll just note that I was demanding that we fix 50% at 50%, so 60% can’t be adjusted to 0% but only down to 50%. So in the second case the score is log(0.5)*3/(log(0.3)+log(0.7)+log(0.4)) = 84.0% which is higher.
I think my measure should have some nice properties which justify it, but I’ll take a while to think about what they are.
EDIT: I’d say now that it might be better to take the difference rather than the ratio. Otherwise you’ll look better calibrated on difficult problems just because your score will be worse overall.
Okay I understand what you’re saying now.
It will take me a while to think about this in more detail, but for now I’ll just note that I was demanding that we fix 50% at 50%, so 60% can’t be adjusted to 0% but only down to 50%. So in the second case the score is log(0.5)*3/(log(0.3)+log(0.7)+log(0.4)) = 84.0% which is higher.
I think my measure should have some nice properties which justify it, but I’ll take a while to think about what they are.
EDIT: I’d say now that it might be better to take the difference rather than the ratio. Otherwise you’ll look better calibrated on difficult problems just because your score will be worse overall.