As has been noted, the impressiveness of the predictions has nothing to do with which way round they are stated; predicting P at 50% is exactly as impressive as predicting ¬P at 50% because they are literally the same. I think one only sounds more impressive when compared to the ‘baseline’ because our brains seem to be more attuned to predictions that sound surprisingly high, and we don’t seem to notice ones that seem surprisingly low. I.e., we hear: ‘there is a 40% chance that Joe Biden will be the democratic nominee’ and somehow translate that to ‘at least 40%’, and fail to consider what it implies for the other 60%.
Consider the examples given of unimpressive-sounding predictions:
There is a 50% chance that the price of a barrel of oil at the end of 2020 will not be between $50.95 and $51.02
There is a 50% chance that Tesla’s stock price at the end of the year 2020 is below $512 or above $514
You can immediately make these sound impressive without flipping them by inserting the word ‘only’ or ‘just’:
There is only a 50% chance that the price of a barrel of oil at the end of 2020 will not be between $50.95 and $51.02
There is just a 50% chance that Tesla’s stock price at the end of the year 2020 will be below $512 or above $514
Suddenly, we are forced to confront how surprisingly low this percentage is, given what you might expect from common wisdom, and it goes back to seeming impressive.
I also think it’s a mistake to confuse ‘common wisdom’ and ‘baseline’ with ‘all possible futures’ when thinking about impressiveness. If I say that there’s a 50% that the price of a barrel of oil at the end of 2020 will be between -$1 million and $1 million, this sounds unimpressive because I’ve chosen a very wide interval relative to common sense. But there are a lot more numbers below -$1 million and above $1 million than there are within it, so arguably this is actually quite a precise prediction in the space of all possible futures, but that’s not important—what matters is the common sense range / baseline.
(Of course, “there’s a 50% that the price of a barrel of oil at the end of 2020 will be between -$1 million and $1 million” is actually a very bold prediction, because it’s saying that there is a 50% chance that the price of oil will be either less than -$1 million or above $1 million which is surprisingly high… but we only notice it when phrased to seem surprisingly high rather than surprisingly low!)
As has been noted, the impressiveness of the predictions has nothing to do with which way round they are stated; predicting P at 50% is exactly as impressive as predicting ¬P at 50% because they are literally the same.
If that were true, then the list
The price of a barrel of oil at the end of 2020 will be between $50.95 and $51.02 (50%)
Tesla’s stock price at the end of the year 2020 is between 512$ and 514$ (50%)
⋯ (more extremely narrow 50% predictions)
and the list
The price of a barrel of oil at the end of 2020 will be between $50.95 and $51.02 (50%)
Tesla’s stock price at the end of the year 2020 is below 512$ or above 514$ (50%)
⋯ (more extremely narrow 50% predictions where every other one is flipped)
would be equally impressive if half of them came true. Unless you think that’s the case, it immediately follows that the way predictions are stated matters for impressiveness.
It doesn’t matter in case of a single 50% prediction, because in that case, one of the phrasings follows the rule I propose, and the other follows the inverse of the rule, which is the other way to maximize boldness. As soon as you have two 50% predictions, there are four possible phrasings and only two of them maximize boldness. (And with n predictions, 2n possible phrasings and only 2 of them maximize boldness.)
The person you’re referring to left an addendum in a second comment (as a reply to the first) acknowledging that phrasing matters for evaluation.
I don’t think there is any difference in those lists! Here’s why:
The impressiveness of 50% predictions can only be evaluated with respect to common wisdom. If everyone thinks P is only 10% likely, and you give it 50%, and P turns out to be true, this is impressive because you gave it a surprisingly high percentage! But also if everyone says P is 90% likely, and P turns out to be false, this is also impressive because you gave it a surprisingly low percentage!
I think what you’re suggesting is that people should always phrase their prediction in a way that, if P comes true, makes their prediction impressive because the percentage was surprisingly high, i.e.:
Most people think there is only a 20% chance that the price of a barrel of oil at the end of 2020 will be between $50.95 and $51.02. I think it’s 50% (surprisingly high), so you should be impressed if it turns out to be true.
But you could also say:
Most people think there is an 80% chance that the price of a barrel of oil at the end of 2020 will not be between $50.95 and $51.02. I think it’s only 50% (surprisingly low), so you should be impressed if it turns out to be false.
These are equally impressive (though I admit the second is phrased in a less intuitive way) - when it comes to 50% predictions, it doesn’t matter whether you evaluate it with respect to ‘it turned out to be true’ vs ‘it turned out to be false’; you’re trying to correctly represent both the percentages in both cases (i.e. the correct ratio), and the impressiveness comes from the extent to which your percentages on both sides differ from the baseline.
I think what I’m saying is that it doesn’t matter how the author phases it, when evaluating 50% predictions we should notice both when it seems surprisingly high and turns out to be true, and when it’s surprisingly low and turns out to be false, as they are both impressive.
When it comes to a list of 50% predictions, it’s impossible to evaluate the impressiveness only by looking at how many came true, since it’s arbitrary which way they are phrased, and you could equally evaluate the impressiveness by how many turned out to be false. So you have to compare each one to the baseline ratio.
Probability is weird and unintuitive and I’m not sure if I’ve explained myself very well...
Everything except your last two paragraphs argues that a single 50% prediction can be flipped, which I agree with. (Again: for every n predictions, there are 2n ways to phrase them and precisely 2 of them are maximally bold. If you have a single prediction, then 2n=2. There are only two ways, both are maximally bold and thus equally bold.)
When it comes to a list of 50% predictions, it’s impossible to evaluate the impressiveness only by looking at how many came true, since it’s arbitrary which way they are phrased
I have proposed a rule that dictates how they are phrased. If this rule is followed, it is not arbitrary how they are phrased. That’s the point.
Again, please consider the following list:
The price of a barrel of oil at the end of 2020 will be between $50.95 and $51.02 (50%)
Tesla’s stock price at the end of the year 2020 is between 512$ and 514$ (50%)
...
You have said that there is no difference between both lists. But this is obviously untrue. I hereby offer you 2000$ if you provide me with a list of this kind and you manage to have, say, at least 10 predictions where between 40% and 60% come true. Would you offer me 2000$ if I presented you with a list of this kind:
The price of a barrel of oil at the end of 2020 will be between $50.95 and $51.02 (50%)
Tesla’s stock price at the end of the year 2020 is below 512$ or above 514$ (50%)
⋯
and between 40% and 60% come true? If so, I will PM you one immediately.
I think you’re stuck at the fact that a 50% prediction also predicts the negated statement with 50%, therefore you assume that the entire post must be false, and therefore you’re not trying to understand the point the post is making. Right now, you’re arguing for something that is obviously untrue. Everyone can make a list of the second kind, no-one can make a list of the first kind. Again, I’m so certain about this that I promise you 2000$ if you prove me wrong.
I agree there is a difference between those lists if you are evaluating everything with respect to each prediction being ‘true’. My point is that sometimes a 50% prediction is impressive when it turns out to be false, because everyone else would have put a higher percentage than 50% on it being true. The first list contains only statements that are impressive if evaluated as true, the second mixes ones that would be impressive if evaluated as true with those that are impressive if evaluated as false. If Tesla’s stock ends up at $513, it feels weird to say ‘well done’ to someone who predicts “Tesla’s stock price at the end of the year 2020 is below 512$ or above 514$ (50%)”, but that’s what I’m suggesting we should do, if everyone else would have only put say a 10% chance on that outcome. If you’re saying that we should always phrase 50% predictions such that they would be impressive if evaluated as true because it’s more intuitive for our brains to interpret, I don’t disagree.
I read the post in good faith and I appreciate that it made me think about predictions and probabilities more deeply. I’m not sure how else to explain my position so will leave it here.
Well, now you’ve changed what you’re arguing for. You initially said that it doesn’t matter which way predictions are stated, and then you said that both lists are the same.
I hereby offer you 2000$ if you provide me with a list of this kind
Can you specify what you mean by ‘of this kind’, i.e. what are the criteria for predictions included on the list? Do you mean a series of predictions which give a narrow range?
Ok this confirms you haven’t understood what I’m claiming. If I gave a list of predictions that were my true 50% confidence interval, they would look very similar to common wisdom because I’m not a superforecaster (unless I had private information about a topic, e.g. a prediction on my net worth at the end of the year or something). If I gave my true 50% confidence interval, I would be indifferent to which way I phrased it (in the same way that if I was to predict 10 coin tosses it doesn’t matter whether I predict ten heads, ten tails, or some mix of the two).
From what I can tell from your examples, the list of predictions you proposed sending to me would not have represented your true 50% confidence intervals each time—you could have sent me 5 things you are very confident will come true and 5 things you are very confident won’t come true. It’s possible to fake any given level of calibration in this way.
Thanks I appreciate that :) And I apologize if my comment about probability being weird came across as patronizing, it was meant to be a reflection on the difficulty I was having putting my model into words, not a comment on your understanding
Ok this confirms you haven’t understood what I’m claiming.
I’m arguing against this claim:
I don’t think there is any difference in those lists!
I’m saying that it is harder to make a list where all predictions seem obviously false and have half of them come true than it is to make a list where half of all predictions seem obviously false and half seem obviously true and have half of them come true. That’s the only thing I’m claiming is true. I know you’ve said other things and I haven’t addressed them; that’s because I wanted to get consensus on this thing before talking about anything else.
Reading this, I was confused: it seemed to me that I should be equally willing to offer $2000 for each list. I realised I was likely enough mistaken that I shouldn’t actually make such an offer!
At first I guessed that the problem in lists like the second was cheating via correlations. That is, a more subtle version of:
The price of a barrel of oil at the end of 2020 will be between $50.95 and $51.02 (50%)
The price of a barrel of oil at the end of 2020 will be below $50.95 or above $51.02 (50%)
Then I went and actually finished reading the post (! oops). I see that you were thinking about cheating, but not quite of this kind. The slogan I would give is something like “cheating by trading accuracy for calibration”. That is, the rule is just supposed to remove the extra phrasing choice from a list to prevent shenanigans from patterned exploitation of this choice.
I now think a challenge to your post would complain that this doesn’t really eliminate the choice—that common wisdom is contradictory enough that I can tweak my phrasing to satisfy your rule and still appear calibrated at 50%-wards probabilities. To be clear, I’m not saying that’s true; the foregoing is just supposed to be a checksum on my understanding.
As has been noted, the impressiveness of the predictions has nothing to do with which way round they are stated; predicting P at 50% is exactly as impressive as predicting ¬P at 50% because they are literally the same. I think one only sounds more impressive when compared to the ‘baseline’ because our brains seem to be more attuned to predictions that sound surprisingly high, and we don’t seem to notice ones that seem surprisingly low. I.e., we hear: ‘there is a 40% chance that Joe Biden will be the democratic nominee’ and somehow translate that to ‘at least 40%’, and fail to consider what it implies for the other 60%.
Consider the examples given of unimpressive-sounding predictions:
There is a 50% chance that the price of a barrel of oil at the end of 2020 will not be between $50.95 and $51.02
There is a 50% chance that Tesla’s stock price at the end of the year 2020 is below $512 or above $514
You can immediately make these sound impressive without flipping them by inserting the word ‘only’ or ‘just’:
There is only a 50% chance that the price of a barrel of oil at the end of 2020 will not be between $50.95 and $51.02
There is just a 50% chance that Tesla’s stock price at the end of the year 2020 will be below $512 or above $514
Suddenly, we are forced to confront how surprisingly low this percentage is, given what you might expect from common wisdom, and it goes back to seeming impressive.
I also think it’s a mistake to confuse ‘common wisdom’ and ‘baseline’ with ‘all possible futures’ when thinking about impressiveness. If I say that there’s a 50% that the price of a barrel of oil at the end of 2020 will be between -$1 million and $1 million, this sounds unimpressive because I’ve chosen a very wide interval relative to common sense. But there are a lot more numbers below -$1 million and above $1 million than there are within it, so arguably this is actually quite a precise prediction in the space of all possible futures, but that’s not important—what matters is the common sense range / baseline.
(Of course, “there’s a 50% that the price of a barrel of oil at the end of 2020 will be between -$1 million and $1 million” is actually a very bold prediction, because it’s saying that there is a 50% chance that the price of oil will be either less than -$1 million or above $1 million which is surprisingly high… but we only notice it when phrased to seem surprisingly high rather than surprisingly low!)
If that were true, then the list
The price of a barrel of oil at the end of 2020 will be between $50.95 and $51.02 (50%)
Tesla’s stock price at the end of the year 2020 is between 512$ and 514$ (50%)
⋯ (more extremely narrow 50% predictions)
and the list
The price of a barrel of oil at the end of 2020 will be between $50.95 and $51.02 (50%)
Tesla’s stock price at the end of the year 2020 is below 512$ or above 514$ (50%)
⋯ (more extremely narrow 50% predictions where every other one is flipped)
would be equally impressive if half of them came true. Unless you think that’s the case, it immediately follows that the way predictions are stated matters for impressiveness.
It doesn’t matter in case of a single 50% prediction, because in that case, one of the phrasings follows the rule I propose, and the other follows the inverse of the rule, which is the other way to maximize boldness. As soon as you have two 50% predictions, there are four possible phrasings and only two of them maximize boldness. (And with n predictions, 2n possible phrasings and only 2 of them maximize boldness.)
The person you’re referring to left an addendum in a second comment (as a reply to the first) acknowledging that phrasing matters for evaluation.
Thanks for the response!
I don’t think there is any difference in those lists! Here’s why:
The impressiveness of 50% predictions can only be evaluated with respect to common wisdom. If everyone thinks P is only 10% likely, and you give it 50%, and P turns out to be true, this is impressive because you gave it a surprisingly high percentage! But also if everyone says P is 90% likely, and P turns out to be false, this is also impressive because you gave it a surprisingly low percentage!
I think what you’re suggesting is that people should always phrase their prediction in a way that, if P comes true, makes their prediction impressive because the percentage was surprisingly high, i.e.:
But you could also say:
These are equally impressive (though I admit the second is phrased in a less intuitive way) - when it comes to 50% predictions, it doesn’t matter whether you evaluate it with respect to ‘it turned out to be true’ vs ‘it turned out to be false’; you’re trying to correctly represent both the percentages in both cases (i.e. the correct ratio), and the impressiveness comes from the extent to which your percentages on both sides differ from the baseline.
I think what I’m saying is that it doesn’t matter how the author phases it, when evaluating 50% predictions we should notice both when it seems surprisingly high and turns out to be true, and when it’s surprisingly low and turns out to be false, as they are both impressive.
When it comes to a list of 50% predictions, it’s impossible to evaluate the impressiveness only by looking at how many came true, since it’s arbitrary which way they are phrased, and you could equally evaluate the impressiveness by how many turned out to be false. So you have to compare each one to the baseline ratio.
Probability is weird and unintuitive and I’m not sure if I’ve explained myself very well...
(Edit: deleted a line based on tone. Apologies.)
Everything except your last two paragraphs argues that a single 50% prediction can be flipped, which I agree with. (Again: for every n predictions, there are 2n ways to phrase them and precisely 2 of them are maximally bold. If you have a single prediction, then 2n=2. There are only two ways, both are maximally bold and thus equally bold.)
I have proposed a rule that dictates how they are phrased. If this rule is followed, it is not arbitrary how they are phrased. That’s the point.
Again, please consider the following list:
The price of a barrel of oil at the end of 2020 will be between $50.95 and $51.02 (50%)
Tesla’s stock price at the end of the year 2020 is between 512$ and 514$ (50%)
...
You have said that there is no difference between both lists. But this is obviously untrue. I hereby offer you 2000$ if you provide me with a list of this kind and you manage to have, say, at least 10 predictions where between 40% and 60% come true. Would you offer me 2000$ if I presented you with a list of this kind:
The price of a barrel of oil at the end of 2020 will be between $50.95 and $51.02 (50%)
Tesla’s stock price at the end of the year 2020 is below 512$ or above 514$ (50%)
⋯
and between 40% and 60% come true? If so, I will PM you one immediately.
I think you’re stuck at the fact that a 50% prediction also predicts the negated statement with 50%, therefore you assume that the entire post must be false, and therefore you’re not trying to understand the point the post is making. Right now, you’re arguing for something that is obviously untrue. Everyone can make a list of the second kind, no-one can make a list of the first kind. Again, I’m so certain about this that I promise you 2000$ if you prove me wrong.
I agree there is a difference between those lists if you are evaluating everything with respect to each prediction being ‘true’. My point is that sometimes a 50% prediction is impressive when it turns out to be false, because everyone else would have put a higher percentage than 50% on it being true. The first list contains only statements that are impressive if evaluated as true, the second mixes ones that would be impressive if evaluated as true with those that are impressive if evaluated as false. If Tesla’s stock ends up at $513, it feels weird to say ‘well done’ to someone who predicts “Tesla’s stock price at the end of the year 2020 is below 512$ or above 514$ (50%)”, but that’s what I’m suggesting we should do, if everyone else would have only put say a 10% chance on that outcome. If you’re saying that we should always phrase 50% predictions such that they would be impressive if evaluated as true because it’s more intuitive for our brains to interpret, I don’t disagree.
I read the post in good faith and I appreciate that it made me think about predictions and probabilities more deeply. I’m not sure how else to explain my position so will leave it here.
Well, now you’ve changed what you’re arguing for. You initially said that it doesn’t matter which way predictions are stated, and then you said that both lists are the same.
Can you specify what you mean by ‘of this kind’, i.e. what are the criteria for predictions included on the list? Do you mean a series of predictions which give a narrow range?
A list of predictions that all seem extremely unlikely to come true according to common wisdom.
Ok this confirms you haven’t understood what I’m claiming. If I gave a list of predictions that were my true 50% confidence interval, they would look very similar to common wisdom because I’m not a superforecaster (unless I had private information about a topic, e.g. a prediction on my net worth at the end of the year or something). If I gave my true 50% confidence interval, I would be indifferent to which way I phrased it (in the same way that if I was to predict 10 coin tosses it doesn’t matter whether I predict ten heads, ten tails, or some mix of the two).
From what I can tell from your examples, the list of predictions you proposed sending to me would not have represented your true 50% confidence intervals each time—you could have sent me 5 things you are very confident will come true and 5 things you are very confident won’t come true. It’s possible to fake any given level of calibration in this way.
Also, I apologize for the statement that I “understand you perfectly” a few posts back. It was stupid and I’ve edited it out.
Thanks I appreciate that :) And I apologize if my comment about probability being weird came across as patronizing, it was meant to be a reflection on the difficulty I was having putting my model into words, not a comment on your understanding
I’m arguing against this claim:
I’m saying that it is harder to make a list where all predictions seem obviously false and have half of them come true than it is to make a list where half of all predictions seem obviously false and half seem obviously true and have half of them come true. That’s the only thing I’m claiming is true. I know you’ve said other things and I haven’t addressed them; that’s because I wanted to get consensus on this thing before talking about anything else.
Reading this, I was confused: it seemed to me that I should be equally willing to offer $2000 for each list. I realised I was likely enough mistaken that I shouldn’t actually make such an offer!
At first I guessed that the problem in lists like the second was cheating via correlations. That is, a more subtle version of:
The price of a barrel of oil at the end of 2020 will be between $50.95 and $51.02 (50%)
The price of a barrel of oil at the end of 2020 will be below $50.95 or above $51.02 (50%)
Then I went and actually finished reading the post (! oops). I see that you were thinking about cheating, but not quite of this kind. The slogan I would give is something like “cheating by trading accuracy for calibration”. That is, the rule is just supposed to remove the extra phrasing choice from a list to prevent shenanigans from patterned exploitation of this choice.
I now think a challenge to your post would complain that this doesn’t really eliminate the choice—that common wisdom is contradictory enough that I can tweak my phrasing to satisfy your rule and still appear calibrated at 50%-wards probabilities. To be clear, I’m not saying that’s true; the foregoing is just supposed to be a checksum on my understanding.