It looks to me like we might be thinking about different questions. Basically I’m just concerned about the sentence “Philip Tetlock discovered that 2% of people are superforecasters.” When I read this sentence, it reads to me like “2% of people are superheroes” — they have performance that is way better than the rest of the population on these tasks. If you graphed “jump height” of the population and 2% of the population is Superman, there would be a clear discontinuity at the higher end. That’s what I imagine when I read the sentence, and that’s what I’m trying to get at above.
It looks like you’re saying that this isn’t true?
(It looks to me like you’re discussing the question of how innate “superforecasting” is. To continue the analogy, whether superforecasters have innate powers like Superman or are just normal humans who train hard like Batman. But I think this is orthogonal to what I’m talking about. I know the sentence “are superforecasters a ‘real’ phenomenon” has multiple operationalizations, which is why I specified one as what I was talking about.)
If you graphed “jump height” of the population and 2% of the population is Superman, there would be a clear discontinuity at the higher end.
But note that the section you quote from Vox doesn’t say that there’s any discontinuity:
Tetlock and his collaborators have run studies involving tens of thousands of participants and have discovered that prediction follows a power law distribution.
A power law distribution is not a discontinuity! Some people are way way better than others. Other people are merely way better than others. And still others are only better than others.
“Philip Tetlock discovered that 2% of people are superforecasters.” When I read this sentence, it reads to me like “2% of people are superheroes”
I think the sentence is misleading (as per Scott Alexander). A better sentence should give the impression that, by way of analogy, some basketball players are NBA players. They may seem superhuman in their basketball ability compared to the Average Joe. And there are a combination of innate traits as well as honed skills that got them there. These would be interesting to study if you wanted to know how to play basketball well. Or if you were putting together a team to play against the Monstars.
But there’s no discontinuity. Going down the curve from NBA players, you get to professional players in other leagues, and then to division 1 college players, and then division 2, etc. Somewhere after bench warmer on their high school basketball team, you get to Average Joe.
So SSC and Vox are both right. Some people are way way better than others (with a power law-like distribution), but there’s no discontinuity.
A better sentence should give the impression that, by way of analogy, some basketball players are NBA players.
This analogy seems like a good way of explaining it. Saying (about forecasting ability) that some people are superforecasters is similar to saying (about basketball ability) that some people are NBA players or saying (about chess ability) that some people are Grandmasters. If you understand in detail the meaning of any one of these claims (or a similar claim about another domain besides forecasting/basketball/chess), then most of what you could say about that claim would port over pretty straightforwardly to the other claims.
(I’ll back off the Superman analogy; I think it’s disanalogous b/c of the discontinuity thing you point out.)
Yeah I like the analogue “some basketball players are NBA players.” It makes it sound totally unsurprising, which it is.
I don’t agree that Vox is right, because:
- I can’t find any evidence for the claim that forecasting ability is power-law distributed, and it’s not clear what that would mean with Brier scores (as Unnamed points out).
- Their use of the term “discovered.”
I don’t think I’m just quibbling over semantics; I definitely had the wrong idea about superforecasters prior to thinking it through, it seems like Vox might have it too, and I’m concerned others who read the article will get the wrong idea as well.
From participating on Metaculus I certainly don’t get the sense that there are people who make uncannily good predictions. If you compare the community prediction to the Metaculus prediction, it looks like there’s a 0.14 difference in average log score, which I guess means a combination of the best predictors tends to put e^(0.14) or 1.15 times as much probability on the correct answer as the time-weighted community median. (The postdiction is better, but I guess subject to overfitting?) That’s substantial, but presumably the combination of the best predictors is better than every individual predictor. The Metaculus prediction also seems to be doing a lot worse than the community prediction on recent questions, so I don’t know what to make of that. I suspect that, while some people are obviously better at forecasting than others, the word “superforecasters” has no content outside of “the best forecasters” and is just there to make the field of research sound more exciting.
it reads to me like “2% of people are superheroes” — they have performance that is way better than the rest of the population on these tasks.
As you concluded in other comments, this is wrong. But there doesn’t need to be a sharp cutoff for there to be “way better” performance. If the top 1% consistently have brier scores on a class of questions of 0.01, the next 1% have brier scores of 0.02, and so on, you’d see “way better performance” without a sharp cutoff—and we’d see that the median brier score of 0.5, exactly as good as flipping a coin, is WAY worse than the people at the top. (Let’s assume everyone else is at least as good as flipping a coin, so the bottom half are all equally useless.)
Thanks for your reply!
It looks to me like we might be thinking about different questions. Basically I’m just concerned about the sentence “Philip Tetlock discovered that 2% of people are superforecasters.” When I read this sentence, it reads to me like “2% of people are superheroes” — they have performance that is way better than the rest of the population on these tasks. If you graphed “jump height” of the population and 2% of the population is Superman, there would be a clear discontinuity at the higher end. That’s what I imagine when I read the sentence, and that’s what I’m trying to get at above.
It looks like you’re saying that this isn’t true?
(It looks to me like you’re discussing the question of how innate “superforecasting” is. To continue the analogy, whether superforecasters have innate powers like Superman or are just normal humans who train hard like Batman. But I think this is orthogonal to what I’m talking about. I know the sentence “are superforecasters a ‘real’ phenomenon” has multiple operationalizations, which is why I specified one as what I was talking about.)
But note that the section you quote from Vox doesn’t say that there’s any discontinuity:
A power law distribution is not a discontinuity! Some people are way way better than others. Other people are merely way better than others. And still others are only better than others.
I think the sentence is misleading (as per Scott Alexander). A better sentence should give the impression that, by way of analogy, some basketball players are NBA players. They may seem superhuman in their basketball ability compared to the Average Joe. And there are a combination of innate traits as well as honed skills that got them there. These would be interesting to study if you wanted to know how to play basketball well. Or if you were putting together a team to play against the Monstars.
But there’s no discontinuity. Going down the curve from NBA players, you get to professional players in other leagues, and then to division 1 college players, and then division 2, etc. Somewhere after bench warmer on their high school basketball team, you get to Average Joe.
So SSC and Vox are both right. Some people are way way better than others (with a power law-like distribution), but there’s no discontinuity.
This analogy seems like a good way of explaining it. Saying (about forecasting ability) that some people are superforecasters is similar to saying (about basketball ability) that some people are NBA players or saying (about chess ability) that some people are Grandmasters. If you understand in detail the meaning of any one of these claims (or a similar claim about another domain besides forecasting/basketball/chess), then most of what you could say about that claim would port over pretty straightforwardly to the other claims.
(I’ll back off the Superman analogy; I think it’s disanalogous b/c of the discontinuity thing you point out.)
Yeah I like the analogue “some basketball players are NBA players.” It makes it sound totally unsurprising, which it is.
I don’t agree that Vox is right, because:
- I can’t find any evidence for the claim that forecasting ability is power-law distributed, and it’s not clear what that would mean with Brier scores (as Unnamed points out).
- Their use of the term “discovered.”
I don’t think I’m just quibbling over semantics; I definitely had the wrong idea about superforecasters prior to thinking it through, it seems like Vox might have it too, and I’m concerned others who read the article will get the wrong idea as well.
From participating on Metaculus I certainly don’t get the sense that there are people who make uncannily good predictions. If you compare the community prediction to the Metaculus prediction, it looks like there’s a 0.14 difference in average log score, which I guess means a combination of the best predictors tends to put e^(0.14) or 1.15 times as much probability on the correct answer as the time-weighted community median. (The postdiction is better, but I guess subject to overfitting?) That’s substantial, but presumably the combination of the best predictors is better than every individual predictor. The Metaculus prediction also seems to be doing a lot worse than the community prediction on recent questions, so I don’t know what to make of that. I suspect that, while some people are obviously better at forecasting than others, the word “superforecasters” has no content outside of “the best forecasters” and is just there to make the field of research sound more exciting.
Agreed. As I said, “it is unlikely that there is a sharp cutoff at 2%, there isn’t a discontinuity, and power law is probably the wrong term.”
As you concluded in other comments, this is wrong. But there doesn’t need to be a sharp cutoff for there to be “way better” performance. If the top 1% consistently have brier scores on a class of questions of 0.01, the next 1% have brier scores of 0.02, and so on, you’d see “way better performance” without a sharp cutoff—and we’d see that the median brier score of 0.5, exactly as good as flipping a coin, is WAY worse than the people at the top. (Let’s assume everyone else is at least as good as flipping a coin, so the bottom half are all equally useless.)