Meta: Because we think understanding info cascades are important, we recently spent ~10 hours trying to figure out how to quantitatively model them, and have contributed our thinking as answers below. While we currently didn’t have the time to continue exploring, we wanted to experiment with seeing how much the LW community could together build on top of our preliminary search, so we’ve put up a basic prize for more work and tried to structure the work around a couple of open questions. This is an experiment! We’re looking forward to reading any of your contributions to the topic, including things like summaries of existing literature and building out new models of the domain.
Background
Consider the following situation:
Bob is wondering whether a certain protein injures the skeletal muscle of patients with a rare disease. He finds a handful papers with some evidence for the claim (and some with evidence against it), so he simply states the claim in his paper, with some caution, and adds that as a citation. Later, Alice comes across Bob’s paper and sees the cited claim, and she proceeds to cite Bob, but without tracing the citation trail back to the original evidence. This keeps happening, in various shapes and forms, and after a while a literature of hundreds of papers builds up where it’s common knowledge that β amyloid injures the skeletal muscle of patients with inclusion body myositis—without the claim having accumulated any more evidence. (This real example was taken from Greenberg, 2009, which is a case study of this event.)
An information-cascade occurs when people update on each others beliefs, rather than sharing the causes of those beliefs, and those beliefs end up with a vestige of support that far outstrips the evidence for them. Satvik Beri might describe this as the problem of only sharing the outputs of your thinking process, not your inputs.
The dynamics here are perhaps reminiscent of those underlying various failures of collective rationality such as asset bubbles, bystander effects and stampedes.
Note that his effect is different from other problems of collective rationality like the replication crisis, which involve low standards for evidence (such as unreasonably lax p-value thresholds or coordination problems preventing publishing of failed experiments), or the degeneracy of much online discussion, which involves tribal signalling and UI encouraging problematic selection effects. Rather, information cascades involve people rationally updating without any object-level evidence at all, and would persist even if the replication crisis and online outrage culture disappeared. If nobody lies or tells untruths, you can still be subject to an information cascade.
Questions
Ben and I are confused about how to think about the negative effects of this problem. We understand the basic idea, but aren’t sure how to reason quantitatively about the impacts, and how to trade-off solving these problems in a community versus doing other improvements to overall efficacy and efficiency of a community. We currently know only how to think about these qualitatively.
We’re posting a couple of related questions that we have some initial thoughts on, that might help clarify the problem.
If you have something you’d like to contribute, but that doesn’t seem to fit into the related questions above, leave it as an answer to this question.
Bounties
We are committing to pay at least either $800 or (No. of answers and comments * $25), whichever is smaller, for work on this problem recorded on LW, done before May 13th. The prize pool will be split across comments in accordance with how valuable we find them, and we might make awards earlier than the deadline (though if you know you’ll put in work in x weeks, it would be good to mention that to one of us via PM).
Ben and Jacob are each responsible for half of the prize money.
Jacob is funding this through Metaculus AI, a new forecasting platform tracking and improving the state-of-the-art in AI forecasting, partly to help avoid info-cascades in the AI safety and policy communities (we’re currently live and inviting beta-users, you can sign-up here).
Examples of work each of us are especially excited about:
Jacob
-
Contributions to our Guesstimate model (linked here), such as reducing uncertainty on the inputs or using better models.
-
Extensions of the Guesstimate model beyond biomedicine, especially in ways that make it more directly applicable to the rationality/effective altruism communities
-
Examples and analysis of existing interventions that deal with this and what makes them work, possibly suggestions for novel ones (though avoiding the trap of optimising for good-seeming ideas)
-
Discussion of how the problem of info-cascades relates to forecasting
Ben
-
Concise summaries of relevant papers and their key contributions
-
Clear and concise explanations of what other LWers have found (e.g. turning 5 long answers into 1 medium sized answer that links back to the others while still conveying the key info. Here’s a good example of someone distilling an answer section).
I’m unfortunately swamped right now, because I’d love to spend time working on this. However, I want to include a few notes, plus reserve a spot to potentially reply more in depth when I decide to engage in some procrastivity.
First, the need for extremizing forecasts (See: Jonathan Baron, Barbara A. Mellers, Philip E. Tetlock, Eric Stone, Lyle H. Ungar (2014) Two Reasons to Make Aggregated Probability Forecasts More Extreme. Decision Analysis 11(2):133-145. http://dx.doi.org/10.1287/deca.2014.0293) seems like evidence that this isn’t typically the dominant factor in forecasting. However, c.f. the usefulness of teaming and sharing as a way to ensure actual reasons get accounted for ( Mellers, B., Ungar, L., Baron, J., Ramos, J., Gurcay, B., Fincher, K., … & Murray, T. (2014). Psychological strategies for winning a geopolitical forecasting tournament. Psychological science, 25(5), 1106-1115. )
Second, the solution that Pearl proposed for message-passing to eliminate over-reinforcement / double counting of data seems to be critical and missing from this discussion. See his book: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. I need to think about this more, but if Aumann agreement is done properly, people eventually converge on correct models of other reasoners, which should also stop info-cascades. The assumption of both models, however, is that there is iterated / repeated communication. I suspect that we can model info-cascades as a failure at exactly that point—in the examples given, people publish papers, and there is no dialogue. For forecasting, explicit discussion of forecasting reasons should fix this. (That is, I might say “My model says 25%, but I’m giving that only 50% credence and allocating the rest to the consensus value of 90%, leading to my final estimate of 57.5%”)
Third, I’d be really interested in formulating testable experimental setups in Mturk or similar to show/not show this occurring, but on reflection this seems non-trivial, and I haven’t thought much about how to do it other than to note that it’s not as easy as it sounded at first.
Thanks for this.
Re extremizing, the recent (excellent) AI Impacts overview of good forecasting practices notes that “more recent data suggests that the successes of the extremizing algorithm during the forecasting tournament were a fluke.”
That’s a great point. I’m uncertain if the analyses account for the cited issue, where we would expect a priori that extremizing slightly would on average hurt the accuracy, but in any moderately sized sample (like the forecasting tournament,) it is likely to help. It also relates to a point I made about why proper scoring rules are not incentive compatible in tournaments in a tweetstorm here; https://twitter.com/davidmanheim/status/1080460223284948994 .
Interestingly, a similar dynamic may happen in tournaments, and could be part of where info-cascades occur. I can in expectation outscore everyone else slightly and minimize my risk of doing very poorly by putting my predictions a bit to the extreme of the current predictions. It’s almost the equivalent of betting a dollar more than the current high bid in price is right—you don’t need to be close, you just need to beat the other people’s scores to win. But if I report my best strategy answer instead of my true guess, it seems that it could cascade if others are unaware I am doing this.
[meta] Not sure why the link to the overview isn’t working. Here’s how the comment looks before I submit it:
https://imgur.com/MF5Z2X4
(The same problem is affecting this comment.)
In any case, the URL is:
https://aiimpacts.org/evidence-on-good-forecasting-practices-from-the-good-judgment-project-an-accompanying-blog-post/
It’s because I am a bad developer and I broke some formatting stuff (again). Will be fixed within the hour.
Edit: Fixed now
Thanks, Oli!
Do you have a link to this data?
As I replied to Pablo below, ”...it’s an argument from first principles. Basically, if you extremize guesses from 90% to 95%, and 90% is a correct estimate, 9⁄10 times you do better due to extremizing. ”
I only read the AI Impacts article that includes that quote, not the data to which the quote alludes. Maybe ask the author?
You don’t need the data—it’s an argument from first principles. Basically, if you extremize guesses from 90% to 95%, and 90% is a correct estimate, 9⁄10 times you do better due to extremizing.
One should be able to think quantitatively about that, eg how many questions do you need to ask until you find out whether your extremization hurt you. I’m surprised by the suggestion that GJP didn’t do enough, unless their extremizations were frequently in the >90% range.
Each season, there were too few questions for this to be obvious, rather than a minor effect, and the “misses” were excused as getting an actually unlikely event wrong. It’s hard to say, post-hoc, that the ~1% consensus opinion about a “freak event” were accurate, but there was a huge surprise (and yes, this happened at least twice) or if the consensus was simply overconfident.
(I also think that the inability to specify estimates <0.5% or >99.5% reduced the extent to which the scores were hurt by these events.)
I did, he said a researcher mentioned it in conversation.
There has been work on this. I believe this is a relevant reference, but I can’t tell for sure without paying to access the article:
Protocols Forcing Consensus, Paul Krasucki
The idea is this: Aumann agreement is typically studied with two communicating agents. We can instead study networks of agents, with various protocols (ie, rules for when agents talk to each other). However, not all such protocols reach consensus, the way we see with two agents!
I believe the condition for reaching consensus is directly analogous to the condition for correctness of belief prop in Bayesian networks, IE, the graph should be a tree.
Good find—I need to look into this more. The paper is on scihub, and it says it needs to be non-cyclical, so yes.
“All the examples in which communicating values of a union-consistent function fails to bring about consensus… must contain a cycle; if there are no cycles in the communication graph, consensus on the value of any union consistent function must be reached.”
So, epistemically virtuous social graphs should contain no cycles? ;3
“I can’t be your friend—we already have a mutual friend.”
“I can’t be your friend—Alice is friends with you and Bob; Bob is friends with Carol; Carol is friends with Dennis; Dennis is friends with Elane; and Elane is my friend already.”
“Fine, I could be your friend so long as we never discuss anything important.”
Or perhaps less unreasonably, we need clear epistemic superiority hierarchies, likely per subject area. And it occurs to me that this could be a super-interesting agent-based/graph theoretic modeling study of information flow and updating. As a nice bonus, this can easily show how ignoring epistemic hierarchies will cause conspiracy cascades—and perhaps show that it will lead to the divergence of rational agent beliefs which Jaynes talks about in PT:LoS.
Another more reasonable solution is to always cite sources. There is an analogous solution in belief propagation, where messages carry a trace of where their information came from. Unfortunately I’ve forgotten what that algorithm is called.
We (jacobjacob and Benito) decided to award $150 (out of the total bounty of $800) to this answer (and the additional points made in the discussion).
It offers relevant and robust evidence about the role of info-cascades in forecasting environments, together with a discussion of its interpretation.
I’ll PM you about payment details.
Here’s a quick bibliography we threw together.
Background:
Information Cascades and Rational Herding: An Annotated Bibliography and Resource Reference (Bikchandani et al. 2004). The best resource on the topic, see in particular the initial papers on the subject.
Y2K Bibliography of Experimental Economics and Social Science Information Cascades and Herd Effects (Holt, 1999. Less thorough, but catches some papers the first one misses.
“Information cascade” from Wikipedia. An excellent introduction.
“Understanding Information Cascades” from Investopedia.
Previous LessWrong posts referring to info cascades:
Information cascades, by Johnicholas, 2009
Information cascades in scientific practice, by RichardKennaway, 2009
Information cascades, LW Wiki
And then here are all the LW posts we could find that used the concept (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) . Not sure how relevant they are, but might be useful in orienting around the concept.
Two recent articles that review the existing economic literature on information cascades:
Sushil Bikhchandani, David Hirshleifer and Ivo Welch, Information cascades, The new Palgrave dictionary of economics (Macmillan, 2018), pp. 6492-6500.
Oksana Doherty, Informational cascades in financial markets: review and synthesis, Review of behavioral finance, vol. 10, no. 1 (2018), pp. 53-69.
An earlier review:
Maria Grazia Romano, Informational cascades in financial economics: a review, Giornale degli Economisti e Annali di Economia, vol. 68, no. 1 (2009), pp. 81-109.
Information Cascades in Multi-Agent Models by Arthur De Vany & Cassey Lee has a section with a useful summary of the relevant economic literature up to 1999. (For more recent overviews, see my other comment.) I copy it below, with links to the works cited (with the exception of Chen (1978) and Lee (1999), both unpublished doctoral dissertations, and De Vany and Walls (1999b), an unpublished working paper):
We (jacobjacob and Ben Pace) decided to award $100 (out of the total bounty of $800) to this answer.
It compiles a useful summary of the literature (we learnt a lot from going through on of the papers linked), and it attaches handy links to everything, which is a task which is on the one hand very helpful to other people, and on the other tedious and without many marginal benefits for the writer, and so likely to be under-incentivised.
I’ll PM you for payment details.
Generally, there is a substantial literature on the topic within the field of network science. The right keywords for Google scholar are something like spreading dynamics in complex networks. Information cascades does not seem to be the best choice of keywords.
There are many options how you can model the state of the node (discrete states, oscillators, continuous variables, vectors of anything of the above,...), multiple options how you may represent the dynamics (something like Ising model / softmax, versions of voter model, oscillator coupling, …) and multiple options how you model the topology (graphs with weighted or unweighted edges, adaptive wiring or not, topologies based on SBM, or scale-free networks, or Erdős–Rényi, or Watts-Strogatz, or real-world network data,… This creates somewhat large space of options, which were usually already explored somewhere in the literature.
What is possibly the single most important thing to know about this, there are universality classes of systems which exhibit similar behaviour; so you can often ignore the details of the dynamics/topology/state representation.
Overall I would suggest to approach this with some intellectual humility and study existing research more, rather then try to reinvent large part of network science on LessWrong. (My guess is something like >2000 research years were spent on the topic often by quite good people.)
I haven’t looked through your links in much detail, but wanted to reply to this:
I either disagree or am confused. It seems good to use resources to outsource your ability to do literature reviews, distillation or extrapolation, to someone with higher comparative advantage. If the LW question feature can enable that, it will make the market for intellectual progress more efficient; and I wanted to test whether this was so.
I am not trying to reinvent network science, and I’m not that interested in the large amount of theoretical work that has been done. I am trying to 1) apply these insights to very particular problems I face (relating to forecasting and more); and 2) think about this from a cost-effectiveness perspective.
I am very happy to trade money for my time in answering these questions.
(Neither 1) nor 2) seems like something I expect the existing literature to have been very interested in. I believe this for similar reasons to those Holden Karnofsky express here.)
I was a bit confused by we but aren’t sure how to reason quantitatively about the impacts, and how much the LW community could together build on top of our preliminary search, which seemed to nudge toward original research. Outsourcing literature reviews, distillation or extrapolation seem great.
Agreed. I realise the OP could be misread; I’ve updated the first paragraph with an extra sentence mentioning that summarising and translating existing work/literature in related domains is also really helpful.
Thanks for the pointers to network science Jan, I don’t know this literature, and if it’s useful here then I’m glad you understand it well enough to guide us (and others) to key parts of it. I don’t see yet how to apply it to thinking quantitatively about scientific and forecasting communities.
If you (or another LWer) thinks that the theory around universality classes is applicable in thinking about how to ensure good info propagation in e.g. a scientific community, and you’re right, then I (and Jacob and likely many others) would love to read a summary, posted here as an answer. Might you explain how understanding the linked paper on universality classes has helped you think about info propagation in forecasting communities / related communities? Concrete heuristics would be especially interesting
(Note that Jacob and I have not taken a math course in topology or graph theory and won’t be able to read answers that assume such, though we’ve both studied formal fields of study and could likely pick it up quickly if it seemed practically useful.)
In general we’re not looking for *novel* contributions. To give an extreme example, if one person translates an existing theoretical literature into a fully fleshed out theory of info-cascades for scientific and forecasting communities, we’ll give them the entire prize pot.
Short summary of how is the lined paper important: you can think about bias as some sort of perturbation. You are then interested in the “cascade of spreading” of the perturbation, and especially factors like the distribution of sizes of cascades. The universality classes tell you this can be predicted by just a few parameters (Table 1 in the linked paper) depending mainly on local dynamic (forecaster-forecaster interactions). Now if you have a good model of the local dynamic, you can determine the parameters and determine into which universality class the problem belongs. Also you can try to infer the dynamics if you have good data on your interactions.
I’m afraid I don’t know enough about how “forecasting communities” work to be able to give you some good guesses what may be the points of leverage. One quick idea, if you have everybody on the same platform, may be to do some sort fo A/B experiment—manipulate the data so some forecasters would see the predictions of other with an artificially introduced perturbation, and see how their output will be different from the control group. If you have data on “individual dynamics” liken that, and some knowledge of network structure, the theory can help you predict the cascade size distribution.
(I also apologize for not being more helpful, but I really don’t have time to work on this for you.)
I wouldn’t say that ‘information cascades’ isn’t the best choice of keywords. What’s happening here is that the same phenomenon is studied by different disciplines in relative isolation from each other. As a consequence, the phenomenon is discussed under different names, depending on the discipline studying it. ‘Information cascades’ (or, as it is sometimes spelled, ‘informational cascades’) is the name used in economics, while network science seems to use a variety of related expressions, such as the one you mention.
We (jacobjacob and Ben Pace) decided to award $200 (out of the total bounty of $800) to this answer (and the additional comment below).
It seems to offer a learnt summary of the relevance of network science (which offers a complementary perspective on the phenomenon to the microeconomic literature linked by other commenters), which not implausibly took Jan at least an order of magnitude less time to compile than it would have taken us. (For example, the seemingly simple fact of using a different Google scholar keyword than “information cascade” might have taken several hours to realise for a non-expert.)
It also attempts to apply these to the case of forecasting (despite Jan’s limited knowledge of the domain), which is a task that would likely have been even harder to do without deep experience of the field.
I’ll PM Jan about payment details.