I’d be much more comfortable answering the probability sections if I knew what epsilon is. I usually say 0% when the value is less than 0.5% and 100% when the value is greater than 99.5%, rounding to the nearest whole percentage, on the grounds that the whole point of using percentages is to avoid explicit fractions (common or decimal). But then you ruin this by explicitly mentioning 0.5% and 99.99% as possible answers. If you had put a hard limit on the number of digits allowed, then I could have used that. In the end, since I saw no consistent guidance, I fell back on my usual practice. The result is that I had a lot of 0s and 100s; hopefully that won’t mess up your algorithms.
ETA: It is probably relevant here that I am a naturally lazy person.
I think it might have been better to ask people to estimate what are the odds that a given statement is true. If a probability of a statement is close to zero or close to one, it gives us better precision without having to worry about digits after the decimal point (however, if a probability is close to one half, it is probably better to ask for a probability). Although it is easy to convert odds to probabilities, how many people in this survey actually took the mental effort to remind themselves to calculate the odds first and only then to express them as probabilities? I might be wrong, but I guess that only a minority. An idea for the next year survey—it might be interesting to compare the answers of two groups, one of which would be asked to estimate probabilities, the other one to estimate the odds.
Are you using “odds” to refer to percentages and “probabilities” to refer to fractions? I don’t think there is actually any difference in meaning between the two terms.
Colloquial language doesn’t make this distinction, but by technical convention, they are different.
Specifically, ‘odds’ refers to expressions like ‘5 to 3 against’; numerically, that’s the fraction 5⁄3, or rather (because of the ‘against’) its reciprocal, 3⁄5. Thus odds run from 0 (impossible) to infinity (certain), with odds of 1 being perfectly balanced between Yes and No. In contrast, probabilities run only from 0 to 1. An event with odds of 5 to 3 against, or equivalently odds of 3⁄5, has a probability of 3/(3+5) = 3⁄8. So the numbers are different. The conversion formulas are O = P/(1 − P) and P = O/(1 + O).
Then there are log-odds; this is log₂ O bits. (You can also use other bases than 2 and correspondingly other units than bits.) Now 0 indicates perfect balance between Yes and No; a positive number means more likely Yes than No, and a negative number means less likely Yes than No. Log-odds run from negative infinity (impossible) to infinity (certain).
Specifically, ‘odds’ refers to expressions like ‘5 to 3 against’
Oh right, I forgot about that definition. The main probability conversions that I was aware of involved converting between fractions and percentages, sometimes expressed instead as probabilities between 0 and 1. Theoretically, it makes sense that odds can also be converted to or from probabilities, now that I think about it. Thanks for your explanation.
Odds can be expressed as a ratio of two numbers [or] as a number, by
dividing the terms in the ratio [....] Odds range from 0 to infinity, while
probabilities range from 0 to 1 [...]”
Yes, that’s exactly what I said. There is no way to express a fraction greater than 100% using odds notation; Saying that odds are “1 million to 1” is 99.9999%, still under 1.
In the Wikipedia
article, take a
look at the table below the words “These are worked out for some simple odds”.
The odds that
TobyBartels is talking
about,
which one gets by dividing the
numbers in an “n to m” expression, and which go from zero to infinity, are
shown in the second and third columns of that table (o_f and
o_a). Probabilities, which go from 0 to 1 or 0% to 100%, are shown in the
fourth and fifth columns (p and q).
Did you actually read the article you linked? It says the exact same thing as I did, phrased differently. Their “Odds range from 0 to infinity” means that any number from 0 to infinity can be used in the odds ratio, but still always represent a probability between 0 and 1. Which is precisely what I said.
Um, representing a number between 0 and 1 is not the same as being a number between 0 and 1. The representation of p = 3⁄8 as odds = 3⁄5 (“5 to 3 against”) is useful in practice, for example because bayes’ rule reduces to plain multiplication for odds ratios.
Yes, odds are good (and log-odds are even better), but people are bad at both dealing with very large absolute values and dealing with very fine precisions. I think that the survey is correct to put in a cut-off (whether an ϵ for probabilities, an N for log-odds, or one of each for odds); it should just tell us where. (Edit: put in stuff about log-odds properly.)
To be more specific: If ϵ ≥ 5 × 10⁻ⁿ (which it must be for some n, if it is a positive real number), then I only need to figure out my probability to n + 1 digits. Upon doing so, if it’s all 0s, then my probability is no more than ϵ, so I can enter 0. Otherwise, I should enter something larger. (And a similar thing holds on the other end.) Specifying ϵ serves the practical purpose of telling us how much work to put into estimating our probabilities. Since I had no guideline for that, I chose to default to ϵ = 1⁄2 (in percentage points), rather than try to additionally work out how small ϵ was supposed to be.
If, instead of bringing up ϵ, the survey had instructed us to use as many decimals as we need to avoid ever answering either 0 or 100, then I probably would have done more work. (There are reasons why this is bad, since the results will be increasingly unreliable, but still it could have said that.) But since I knew that at some point my work would be ignored, I didn’t do any.
(Edits: minor grammar and precise phrasing of inequalities.)
I took epsilon to be simply 0.5, on the basis of “the survey can take decimals but I’m going to use whole numbers as suggested, so 0 means I rounded down anything less than 0.5”. This is imprecise but gives me greater confidence in my answers, and (as you say), I have some tendency towards laziness.
I don’t think it will mess up the algorithms. My guess is that most people probably rounded most calibration answers to the tens place due to lack of enough confidence to be more precise, but since people are giving different values, the average across all respondents is unlikely to fall on an increment of ten, and should be a reasonably accurate measure of the respondents’ collective assigned probability for a question.
It could mess them up, because in theory a single wrong answer with 100% confidence renders the entire series infinitely poorly calibrated. The survey says that this won’t be done, that 100% will be treated as something slightly less than that. But how much less could depend on assumptions that the survey-makers made about how often people would answer this way, and maybe I did it too much.
I doubt it, since I’m pretty sure that they know enough about these pitfalls to avoid them. But I felt that I answered 0 and 100 quite a lot, so I thought that some warning was in order.
Even though percentages are typically used for cases where precision is less important, I’d say that in this context it would be better to err on the side of precision.
I’d be much more comfortable answering the probability sections if I knew what epsilon is. I usually say 0% when the value is less than 0.5% and 100% when the value is greater than 99.5%, rounding to the nearest whole percentage, on the grounds that the whole point of using percentages is to avoid explicit fractions (common or decimal). But then you ruin this by explicitly mentioning 0.5% and 99.99% as possible answers. If you had put a hard limit on the number of digits allowed, then I could have used that. In the end, since I saw no consistent guidance, I fell back on my usual practice. The result is that I had a lot of 0s and 100s; hopefully that won’t mess up your algorithms.
ETA: It is probably relevant here that I am a naturally lazy person.
I think it might have been better to ask people to estimate what are the odds that a given statement is true. If a probability of a statement is close to zero or close to one, it gives us better precision without having to worry about digits after the decimal point (however, if a probability is close to one half, it is probably better to ask for a probability). Although it is easy to convert odds to probabilities, how many people in this survey actually took the mental effort to remind themselves to calculate the odds first and only then to express them as probabilities? I might be wrong, but I guess that only a minority. An idea for the next year survey—it might be interesting to compare the answers of two groups, one of which would be asked to estimate probabilities, the other one to estimate the odds.
Are you using “odds” to refer to percentages and “probabilities” to refer to fractions? I don’t think there is actually any difference in meaning between the two terms.
Colloquial language doesn’t make this distinction, but by technical convention, they are different.
Specifically, ‘odds’ refers to expressions like ‘5 to 3 against’; numerically, that’s the fraction 5⁄3, or rather (because of the ‘against’) its reciprocal, 3⁄5. Thus odds run from 0 (impossible) to infinity (certain), with odds of 1 being perfectly balanced between Yes and No. In contrast, probabilities run only from 0 to 1. An event with odds of 5 to 3 against, or equivalently odds of 3⁄5, has a probability of 3/(3+5) = 3⁄8. So the numbers are different. The conversion formulas are O = P/(1 − P) and P = O/(1 + O).
Then there are log-odds; this is log₂ O bits. (You can also use other bases than 2 and correspondingly other units than bits.) Now 0 indicates perfect balance between Yes and No; a positive number means more likely Yes than No, and a negative number means less likely Yes than No. Log-odds run from negative infinity (impossible) to infinity (certain).
Oh right, I forgot about that definition. The main probability conversions that I was aware of involved converting between fractions and percentages, sometimes expressed instead as probabilities between 0 and 1. Theoretically, it makes sense that odds can also be converted to or from probabilities, now that I think about it. Thanks for your explanation.
‘5 to 3 against’ is 3⁄8, not 3⁄5. Odds of ‘N to M’ or ‘N to M against’ are always between 0 and 1.
5 to 3 against is 3⁄5 (as odds), which is a probability of 3⁄8. You are muddling probability and odds ratios in an unacceptable way.
Wikipedia:
Yes, that’s exactly what I said. There is no way to express a fraction greater than 100% using odds notation; Saying that odds are “1 million to 1” is 99.9999%, still under 1.
In the Wikipedia article, take a look at the table below the words “These are worked out for some simple odds”. The odds that TobyBartels is talking about, which one gets by dividing the numbers in an “n to m” expression, and which go from zero to infinity, are shown in the second and third columns of that table (o_f and o_a). Probabilities, which go from 0 to 1 or 0% to 100%, are shown in the fourth and fifth columns (p and q).
You said ‘Odds […] are always between 0 and 1’, while Wikipedia said ‘Odds range from 0 to infinity’, so you didn’t say the same thing.
Did you actually read the article you linked? It says the exact same thing as I did, phrased differently. Their “Odds range from 0 to infinity” means that any number from 0 to infinity can be used in the odds ratio, but still always represent a probability between 0 and 1. Which is precisely what I said.
No, that’s not what you said. I am now done with this conversation.
Um, representing a number between 0 and 1 is not the same as being a number between 0 and 1. The representation of p = 3⁄8 as odds = 3⁄5 (“5 to 3 against”) is useful in practice, for example because bayes’ rule reduces to plain multiplication for odds ratios.
Yes, odds are good (and log-odds are even better), but people are bad at both dealing with very large absolute values and dealing with very fine precisions. I think that the survey is correct to put in a cut-off (whether an ϵ for probabilities, an N for log-odds, or one of each for odds); it should just tell us where. (Edit: put in stuff about log-odds properly.)
Epsilon is a minuscule amount. It’s vanishingly small, but it’s still there.
Yes, but which minuscule amount?
To be more specific: If ϵ ≥ 5 × 10⁻ⁿ (which it must be for some n, if it is a positive real number), then I only need to figure out my probability to n + 1 digits. Upon doing so, if it’s all 0s, then my probability is no more than ϵ, so I can enter 0. Otherwise, I should enter something larger. (And a similar thing holds on the other end.) Specifying ϵ serves the practical purpose of telling us how much work to put into estimating our probabilities. Since I had no guideline for that, I chose to default to ϵ = 1⁄2 (in percentage points), rather than try to additionally work out how small ϵ was supposed to be.
If, instead of bringing up ϵ, the survey had instructed us to use as many decimals as we need to avoid ever answering either 0 or 100, then I probably would have done more work. (There are reasons why this is bad, since the results will be increasingly unreliable, but still it could have said that.) But since I knew that at some point my work would be ignored, I didn’t do any.
(Edits: minor grammar and precise phrasing of inequalities.)
I took epsilon to be simply 0.5, on the basis of “the survey can take decimals but I’m going to use whole numbers as suggested, so 0 means I rounded down anything less than 0.5”. This is imprecise but gives me greater confidence in my answers, and (as you say), I have some tendency towards laziness.
Yes, that’s what I did too (0.5%).
I don’t think it will mess up the algorithms. My guess is that most people probably rounded most calibration answers to the tens place due to lack of enough confidence to be more precise, but since people are giving different values, the average across all respondents is unlikely to fall on an increment of ten, and should be a reasonably accurate measure of the respondents’ collective assigned probability for a question.
It could mess them up, because in theory a single wrong answer with 100% confidence renders the entire series infinitely poorly calibrated. The survey says that this won’t be done, that 100% will be treated as something slightly less than that. But how much less could depend on assumptions that the survey-makers made about how often people would answer this way, and maybe I did it too much.
I doubt it, since I’m pretty sure that they know enough about these pitfalls to avoid them. But I felt that I answered 0 and 100 quite a lot, so I thought that some warning was in order.
Even though percentages are typically used for cases where precision is less important, I’d say that in this context it would be better to err on the side of precision.