I’m unfortunately swamped right now, because I’d love to spend time working on this. However, I want to include a few notes, plus reserve a spot to potentially reply more in depth when I decide to engage in some procrastivity.
First, the need for extremizing forecasts (See: Jonathan Baron, Barbara A. Mellers, Philip E. Tetlock, Eric Stone, Lyle H. Ungar (2014) Two Reasons to Make Aggregated Probability Forecasts More Extreme. Decision Analysis 11(2):133-145. http://dx.doi.org/10.1287/deca.2014.0293) seems like evidence that this isn’t typically the dominant factor in forecasting. However, c.f. the usefulness of teaming and sharing as a way to ensure actual reasons get accounted for ( Mellers, B., Ungar, L., Baron, J., Ramos, J., Gurcay, B., Fincher, K., … & Murray, T. (2014). Psychological strategies for winning a geopolitical forecasting tournament. Psychological science, 25(5), 1106-1115. )
Second, the solution that Pearl proposed for message-passing to eliminate over-reinforcement / double counting of data seems to be critical and missing from this discussion. See his book: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. I need to think about this more, but if Aumann agreement is done properly, people eventually converge on correct models of other reasoners, which should also stop info-cascades. The assumption of both models, however, is that there is iterated / repeated communication. I suspect that we can model info-cascades as a failure at exactly that point—in the examples given, people publish papers, and there is no dialogue. For forecasting, explicit discussion of forecasting reasons should fix this. (That is, I might say “My model says 25%, but I’m giving that only 50% credence and allocating the rest to the consensus value of 90%, leading to my final estimate of 57.5%”)
Third, I’d be really interested in formulating testable experimental setups in Mturk or similar to show/not show this occurring, but on reflection this seems non-trivial, and I haven’t thought much about how to do it other than to note that it’s not as easy as it sounded at first.
Re extremizing, the recent (excellent) AI Impacts overview of good forecasting practices notes that “more recent data suggests that the successes of the extremizing algorithm during the forecasting tournament were a fluke.”
That’s a great point. I’m uncertain if the analyses account for the cited issue, where we would expect a priori that extremizing slightly would on average hurt the accuracy, but in any moderately sized sample (like the forecasting tournament,) it is likely to help. It also relates to a point I made about why proper scoring rules are not incentive compatible in tournaments in a tweetstorm here; https://twitter.com/davidmanheim/status/1080460223284948994 .
Interestingly, a similar dynamic may happen in tournaments, and could be part of where info-cascades occur. I can in expectation outscore everyone else slightly and minimize my risk of doing very poorly by putting my predictions a bit to the extreme of the current predictions. It’s almost the equivalent of betting a dollar more than the current high bid in price is right—you don’t need to be close, you just need to beat the other people’s scores to win. But if I report my best strategy answer instead of my true guess, it seems that it could cascade if others are unaware I am doing this.
As I replied to Pablo below, ”...it’s an argument from first principles. Basically, if you extremize guesses from 90% to 95%, and 90% is a correct estimate, 9⁄10 times you do better due to extremizing. ”
You don’t need the data—it’s an argument from first principles. Basically, if you extremize guesses from 90% to 95%, and 90% is a correct estimate, 9⁄10 times you do better due to extremizing.
One should be able to think quantitatively about that, eg how many questions do you need to ask until you find out whether your extremization hurt you. I’m surprised by the suggestion that GJP didn’t do enough, unless their extremizations were frequently in the >90% range.
Each season, there were too few questions for this to be obvious, rather than a minor effect, and the “misses” were excused as getting an actually unlikely event wrong. It’s hard to say, post-hoc, that the ~1% consensus opinion about a “freak event” were accurate, but there was a huge surprise (and yes, this happened at least twice) or if the consensus was simply overconfident.
(I also think that the inability to specify estimates <0.5% or >99.5% reduced the extent to which the scores were hurt by these events.)
I need to think about this more, but if Aumann agreement is done properly, people eventually converge on correct models of other reasoners, which should also stop info-cascades.
There has been work on this. I believe this is a relevant reference, but I can’t tell for sure without paying to access the article:
The idea is this: Aumann agreement is typically studied with two communicating agents. We can instead study networks of agents, with various protocols (ie, rules for when agents talk to each other). However, not all such protocols reach consensus, the way we see with two agents!
I believe the condition for reaching consensus is directly analogous to the condition for correctness of belief prop in Bayesian networks, IE, the graph should be a tree.
Good find—I need to look into this more. The paper is on scihub, and it says it needs to be non-cyclical, so yes.
“All the examples in which communicating values of a union-consistent function fails to bring about consensus… must contain a cycle; if there are no cycles in the communication graph, consensus on the value of any union consistent function must be reached.”
So, epistemically virtuous social graphs should contain no cycles? ;3
“I can’t be your friend—we already have a mutual friend.”
“I can’t be your friend—Alice is friends with you and Bob; Bob is friends with Carol; Carol is friends with Dennis; Dennis is friends with Elane; and Elane is my friend already.”
“Fine, I could be your friend so long as we never discuss anything important.”
Or perhaps less unreasonably, we need clear epistemic superiority hierarchies, likely per subject area. And it occurs to me that this could be a super-interesting agent-based/graph theoretic modeling study of information flow and updating. As a nice bonus, this can easily show how ignoring epistemic hierarchies will cause conspiracy cascades—and perhaps show that it will lead to the divergence of rational agent beliefs which Jaynes talks about in PT:LoS.
Another more reasonable solution is to always cite sources. There is an analogous solution in belief propagation, where messages carry a trace of where their information came from. Unfortunately I’ve forgotten what that algorithm is called.
I’m unfortunately swamped right now, because I’d love to spend time working on this. However, I want to include a few notes, plus reserve a spot to potentially reply more in depth when I decide to engage in some procrastivity.
First, the need for extremizing forecasts (See: Jonathan Baron, Barbara A. Mellers, Philip E. Tetlock, Eric Stone, Lyle H. Ungar (2014) Two Reasons to Make Aggregated Probability Forecasts More Extreme. Decision Analysis 11(2):133-145. http://dx.doi.org/10.1287/deca.2014.0293) seems like evidence that this isn’t typically the dominant factor in forecasting. However, c.f. the usefulness of teaming and sharing as a way to ensure actual reasons get accounted for ( Mellers, B., Ungar, L., Baron, J., Ramos, J., Gurcay, B., Fincher, K., … & Murray, T. (2014). Psychological strategies for winning a geopolitical forecasting tournament. Psychological science, 25(5), 1106-1115. )
Second, the solution that Pearl proposed for message-passing to eliminate over-reinforcement / double counting of data seems to be critical and missing from this discussion. See his book: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. I need to think about this more, but if Aumann agreement is done properly, people eventually converge on correct models of other reasoners, which should also stop info-cascades. The assumption of both models, however, is that there is iterated / repeated communication. I suspect that we can model info-cascades as a failure at exactly that point—in the examples given, people publish papers, and there is no dialogue. For forecasting, explicit discussion of forecasting reasons should fix this. (That is, I might say “My model says 25%, but I’m giving that only 50% credence and allocating the rest to the consensus value of 90%, leading to my final estimate of 57.5%”)
Third, I’d be really interested in formulating testable experimental setups in Mturk or similar to show/not show this occurring, but on reflection this seems non-trivial, and I haven’t thought much about how to do it other than to note that it’s not as easy as it sounded at first.
Thanks for this.
Re extremizing, the recent (excellent) AI Impacts overview of good forecasting practices notes that “more recent data suggests that the successes of the extremizing algorithm during the forecasting tournament were a fluke.”
That’s a great point. I’m uncertain if the analyses account for the cited issue, where we would expect a priori that extremizing slightly would on average hurt the accuracy, but in any moderately sized sample (like the forecasting tournament,) it is likely to help. It also relates to a point I made about why proper scoring rules are not incentive compatible in tournaments in a tweetstorm here; https://twitter.com/davidmanheim/status/1080460223284948994 .
Interestingly, a similar dynamic may happen in tournaments, and could be part of where info-cascades occur. I can in expectation outscore everyone else slightly and minimize my risk of doing very poorly by putting my predictions a bit to the extreme of the current predictions. It’s almost the equivalent of betting a dollar more than the current high bid in price is right—you don’t need to be close, you just need to beat the other people’s scores to win. But if I report my best strategy answer instead of my true guess, it seems that it could cascade if others are unaware I am doing this.
[meta] Not sure why the link to the overview isn’t working. Here’s how the comment looks before I submit it:
https://imgur.com/MF5Z2X4
(The same problem is affecting this comment.)
In any case, the URL is:
https://aiimpacts.org/evidence-on-good-forecasting-practices-from-the-good-judgment-project-an-accompanying-blog-post/
It’s because I am a bad developer and I broke some formatting stuff (again). Will be fixed within the hour.
Edit: Fixed now
Thanks, Oli!
Do you have a link to this data?
As I replied to Pablo below, ”...it’s an argument from first principles. Basically, if you extremize guesses from 90% to 95%, and 90% is a correct estimate, 9⁄10 times you do better due to extremizing. ”
I only read the AI Impacts article that includes that quote, not the data to which the quote alludes. Maybe ask the author?
You don’t need the data—it’s an argument from first principles. Basically, if you extremize guesses from 90% to 95%, and 90% is a correct estimate, 9⁄10 times you do better due to extremizing.
One should be able to think quantitatively about that, eg how many questions do you need to ask until you find out whether your extremization hurt you. I’m surprised by the suggestion that GJP didn’t do enough, unless their extremizations were frequently in the >90% range.
Each season, there were too few questions for this to be obvious, rather than a minor effect, and the “misses” were excused as getting an actually unlikely event wrong. It’s hard to say, post-hoc, that the ~1% consensus opinion about a “freak event” were accurate, but there was a huge surprise (and yes, this happened at least twice) or if the consensus was simply overconfident.
(I also think that the inability to specify estimates <0.5% or >99.5% reduced the extent to which the scores were hurt by these events.)
I did, he said a researcher mentioned it in conversation.
There has been work on this. I believe this is a relevant reference, but I can’t tell for sure without paying to access the article:
Protocols Forcing Consensus, Paul Krasucki
The idea is this: Aumann agreement is typically studied with two communicating agents. We can instead study networks of agents, with various protocols (ie, rules for when agents talk to each other). However, not all such protocols reach consensus, the way we see with two agents!
I believe the condition for reaching consensus is directly analogous to the condition for correctness of belief prop in Bayesian networks, IE, the graph should be a tree.
Good find—I need to look into this more. The paper is on scihub, and it says it needs to be non-cyclical, so yes.
“All the examples in which communicating values of a union-consistent function fails to bring about consensus… must contain a cycle; if there are no cycles in the communication graph, consensus on the value of any union consistent function must be reached.”
So, epistemically virtuous social graphs should contain no cycles? ;3
“I can’t be your friend—we already have a mutual friend.”
“I can’t be your friend—Alice is friends with you and Bob; Bob is friends with Carol; Carol is friends with Dennis; Dennis is friends with Elane; and Elane is my friend already.”
“Fine, I could be your friend so long as we never discuss anything important.”
Or perhaps less unreasonably, we need clear epistemic superiority hierarchies, likely per subject area. And it occurs to me that this could be a super-interesting agent-based/graph theoretic modeling study of information flow and updating. As a nice bonus, this can easily show how ignoring epistemic hierarchies will cause conspiracy cascades—and perhaps show that it will lead to the divergence of rational agent beliefs which Jaynes talks about in PT:LoS.
Another more reasonable solution is to always cite sources. There is an analogous solution in belief propagation, where messages carry a trace of where their information came from. Unfortunately I’ve forgotten what that algorithm is called.
We (jacobjacob and Benito) decided to award $150 (out of the total bounty of $800) to this answer (and the additional points made in the discussion).
It offers relevant and robust evidence about the role of info-cascades in forecasting environments, together with a discussion of its interpretation.
I’ll PM you about payment details.