gwern comments on Alternative to Bayesian Score

gwern 4 Aug 2013 22:43 UTC
0 points

The model I am imagining from you is that there is some countable collection of statements you want to assign true/false to. You assign some weight function to the statements so that to total weight of all statements is some finite number, and your score is the sum of the weights of all statements which you choose to answer.

Hm, no, I wasn’t really thinking that way. I don’t want some finite number, I want everyone to reach different numbers so more accurate predictors score higher.

The weights on particular functions do not have to be even algorithmicly set—for example, a prediction market is immune to the ‘sky is blue’ problem because if one were to start a contract for ‘the sky is blue tomorrow’, no one would trade on it unless one were willing to lose money being a market-maker as the other trader bid it up to the meteorologically-accurate 80% or whatever. One can pick and choose as much as one pleases, but unless one’s contracts were valuable to other people for any reason, it would be impossible to make money by stuffing the market with bogus contracts. The utility just becomes how much money you made.

I think that the 0 utility point should be put at the utility of the ⁵⁰⁄₅₀ probability assignment for each question.

I think this doesn’t work because you’re trying to invent a non-informative prior, and it’s trivial to set up sets of predictions where the obviously better non-informative prior is not 1/2: for example, set up 3 predictions for each of 3 mutually-exhaustive outcomes, where the non-informative prior obviously looks more like ¹⁄₃ and ¹⁄₂ means someone is getting robbed. More importantly, uninformative priors are disputed and it’s not clear what they are in more complex situations. (Frequentist Larry Wasserman goes so far as to call them “lost causes” and “perpetual motion machines”.)

But just saying that you scale each question by its importance doesn’t fix the fact that if you model this as you can choose to answer questions if you want and your utility is the sum of your utilities for the individual questions encourages not answering any questions under the Bayesian rule as written, since it can only give you negative utility. You have to fix that by either fixing 0 points for your utilities in some reasonable way or just requiring that you are assigned utility for every question, and there is a default answer if you don’t think about it at all.

Perhaps a raw log odds is not the best idea, but do you really think there is no way to interpret them into some score which disincentivizes strategic predicting? This sounds just arrogant to me, and I would only believe it if you summarized all the existing research into rewarding experts and showed that log odds simply could not be used in any circumstance where any predictor could predict a subset of the specified predictions.

but if we assume for now that there are only finitely many questions, and all questions have rational weights, then weighing the questions is similar to just asking the same questions multiple times (proportional to its weight).

There aren’t finitely many questions because one can ask questions involving each of the infinite set of integers… Knowing that questions are asking identical questions sounds like an impossible demand to meet (for example, if any system claimed this, it could solve the Halting Problem by simply asking it to predict the output of 2 Turing machines).