BTW, I’d also disallow 0 and 100, and give the option of giving log-odds instead of probability (and maybe encourage to do that for probabilities 99%). Someone’s “epsilon” might be 10^-4 whereas someone else’s might be 10^-30.
The point is not having to type lots of zeros (or of nines) with extreme probabilities (so that people won’t weasel out and use ‘epsilon’); having to type 1:999999999999999 is no improvement over having to type 0.000000000000001.
Is such precision meaningful? At least for me personally, 0.1% is about as low as I can meaningfully go—I can’t really discriminate between me having an estimate of 0.1%, 0.001%, or 0.0000000000001%.
Specifically, I would guess that you can distinguish the strength of your belief that a lottery ticket you might purchase will win the jackpot from one in a thousand (a.k.a. 0.1%). Am I mistaken?
That’s a very special case—in the case of the lottery, it is actually possible-in-principle to enumerate BIG_NUMBER equally likely mutually-exclusive outcomes. Same with getting the works of shakespeare out of your random number generator. The things under discussion don’t have that quality.
I agree in principle, but on the other hand the questions on the survey are nowhere as easy as “what’s the probability of winning such-and-such lottery”.
I’d force log odds, as they are the more natural representation and much less susceptible to irrational certainty and nonsense answers.
Someone has to actually try and comprehend what they are doing to troll logits; -INF seems a lot more out to lunch than p = 0.
I’d also like someone to go thru the math to figure out how to correctly take the mean of probability estimates. I see no obvious reason why you can simply average [0, 1] probability. The correct method would probably involve cooking up a hypothetical bayesian judge that takes everyones estimates as evidence.
Edit: since logits can be a bit unintuitive, I’d give a few calibration examples like odds of rolling a 6 on a die, odds of winning some lottery, fair odds, odds of surviving a car crash, etc.
I’d force log odds, as they are the more natural representation and much less susceptible to irrational certainty and nonsense answers.
Personally, for probabilities roughly between 20% and 80% I find probabilities (or non-log odds) easier than understand than log-odds.
Someone has to actually try and comprehend what they are doing to troll logits; -INF seems a lot more out to lunch than p = 0.
Yeah. One of the reason why I proposed this is the median answer of 0 in several probability questions. (I’d also require a rationale in order to enter probabilities more extreme than 1%/99%.)
I’d also like someone to go thru the math to figure out how to correctly take the mean of probability estimates. I see no obvious reason why you can simply average [0, 1] probability. The correct method would probably involve cooking up a hypothetical bayesian judge that takes everyones estimates as evidence.
I’d go with the average of log-odds, but this requires all of them to be finite...
yeah. that’s what I was getting at with the maxentropy judge.
On further thought, I really should look into figuring this out. Maybe I’ll do some work on it and post a discussion post. This could be a great group rationality tool.
BTW, I’d also disallow 0 and 100, and give the option of giving log-odds instead of probability (and maybe encourage to do that for probabilities 99%). Someone’s “epsilon” might be 10^-4 whereas someone else’s might be 10^-30.
I second that. See my post at http://lesswrong.com/r/discussion/lw/8lr/logodds_or_logits/ for a concise summary. Getting the LW survey to use log-odds would go a long way towards getting LW to start using log-odds in normal conversation.
People will mess up the log-odds, though. Non-log odds seem safer.
Two fields instead of one, but it seems cleaner than any of the other alternatives.
The point is not having to type lots of zeros (or of nines) with extreme probabilities (so that people won’t weasel out and use ‘epsilon’); having to type 1:999999999999999 is no improvement over having to type 0.000000000000001.
Is such precision meaningful? At least for me personally, 0.1% is about as low as I can meaningfully go—I can’t really discriminate between me having an estimate of 0.1%, 0.001%, or 0.0000000000001%.
I expect this is incorrect.
Specifically, I would guess that you can distinguish the strength of your belief that a lottery ticket you might purchase will win the jackpot from one in a thousand (a.k.a. 0.1%). Am I mistaken?
That’s a very special case—in the case of the lottery, it is actually possible-in-principle to enumerate BIG_NUMBER equally likely mutually-exclusive outcomes. Same with getting the works of shakespeare out of your random number generator. The things under discussion don’t have that quality.
I agree in principle, but on the other hand the questions on the survey are nowhere as easy as “what’s the probability of winning such-and-such lottery”.
You’re right, good point.
Just type 1:1e15 (or 1e-15 if you don’t want odd ratios).
I’d force log odds, as they are the more natural representation and much less susceptible to irrational certainty and nonsense answers.
Someone has to actually try and comprehend what they are doing to troll logits; -INF seems a lot more out to lunch than p = 0.
I’d also like someone to go thru the math to figure out how to correctly take the mean of probability estimates. I see no obvious reason why you can simply average [0, 1] probability. The correct method would probably involve cooking up a hypothetical bayesian judge that takes everyones estimates as evidence.
Edit: since logits can be a bit unintuitive, I’d give a few calibration examples like odds of rolling a 6 on a die, odds of winning some lottery, fair odds, odds of surviving a car crash, etc.
Personally, for probabilities roughly between 20% and 80% I find probabilities (or non-log odds) easier than understand than log-odds.
Yeah. One of the reason why I proposed this is the median answer of 0 in several probability questions. (I’d also require a rationale in order to enter probabilities more extreme than 1%/99%.)
I’d go with the average of log-odds, but this requires all of them to be finite...
Weighting, in part, by the calibration questions?
I dunno how you would weight it. I think you’d want to have a maxentropy ‘fair’ judge at least for comparison.
Calibration questions are probably the least controversial way of weighting. Compare to, say, trying to weight using karma.
This might be an interesting thing to develop. A voting system backed up by solid bayes-math could be useful for more than just LW surveys.
It might be interesting to see what results are produced by several weighting approaches.
yeah. that’s what I was getting at with the maxentropy judge.
On further thought, I really should look into figuring this out. Maybe I’ll do some work on it and post a discussion post. This could be a great group rationality tool.