The Prediction Pyramid: Why Fundamental Work is Needed for Prediction Work
Epistemic state: I feel like this post makes fairly intuitive claims, but I have uncertainty on many of the specifics.
In data science, it is a common mistake for organizations to focus on specific exciting parts like machine learning and data visualizations, while overlooking the infrastructural concerns required to allow for such things. There have been several attempts at making pyramids to showcase the necessary data science dependencies in order to make the most accessible parts realizable.
Similar could be said for predictions. Predictions require foundational work in order to be possible and effective. We can use the prediction pyramid below to show this dependency.
Evaluations
People beginning a prediction practice quickly run into the challenge of having well-specified questions. It’s not enough to ask who will win a sports game, one needs to clarify how every exceptional situation is to be handled.[1]
Question specification is a big part of Metaculus. Often questions carry significant discussion even after a question is posed in order to discuss possible edge cases.
In addition to question specification, evaluations can be costly to perform. Even in simple cases it still requires manual work. In more complex cases evaluations could take a very long time. GiveWell does charity evaluations, and these can be extensive. This document discusses some other kinds of evaluations, in the “Possible Uses” section.
Ontologies
Say one is trying to determine which diseases will be important to worry about in 2025. One would first need a taxonomy of diseases that will not change until after 2025. If they were to somehow use a poor or an unusual taxonomy, resulting information wouldn’t be useful to others.
In the case of diseases, decades of research years have been carried out in order to establish pragmatic and popular taxonomies. In other domains, new ontologies would need to be developed. Note that we consider ontologies to be a superset of taxonomies.
Another example: the usefulness of careers. 80,000 Hours is an expert here. They have a system which splits career paths into several distinct domains, and rates each one using six distinct attributes. They then do evaluations for each combination.
If it were assured they would continue to do so in the future, it would be relatively straightforward to forecast their future evaluations. If one wanted to do similar predictions without their work, one would have to come up with their own foundational thinking, ontologies, and evaluations.
Other concrete examples of ontologies, for concreteness:
The “Importance, Neglectedness, Tractability” framework for evaluating charity effectiveness
Nick Bostrom’s typology of information hazards, categorizing them by types and subtypes of information transfer mode and effect
Nick Bostrom’s definition of the vulnerable world hypothesis, including the “semi-anarchic default condition” consisting of limited capacity for preventive policing and global governance, and diverse motivations of actors
“Posts on LessWrong” are already discrete, and would represent a taxonomy
Foundational Understanding
Even before worrying about predictions or ontologies, it’s important to have good foundational understandings of topics in question. An ancient Greek scholar believing in Greek Mythology may spend a lot of time creating ontologies around the gods, but this would be a poor foundation for pragmatic work.
In the case of GiveWell, it took some specific philosophical understanding to decide that charity effectiveness was an important thing to optimize for. Later they came up with the “Importance, Neglectedness, Tractability” framework based on this understanding.
Implications
Predictions are most effective within a cluster of other specific tools.
For predictions to be useful, several other things need to go well, and thus they are also worth paying attention to. Discussions about “doing great predictions” should often include information on these other aspects. The equivalent in data science would be to recognize the importance and challenges of fundamental issues like data warehousing when discussing the eventual goal of data visualization.
Areas with existing substantial fundamental work should be easy to add predictions to.
There are many kinds of data which are already categorized and evaluated; in these cases, the predictions can be quite straightforward. For instance, the “winner of the next presidential election” seems obviously important and will be decided by existing parties, so is a very accessible candidate for forecasting.
It could be good to make lists of metrics and data sources that will be both interesting and reliably provided in the future. For example, it’s very likely that Wikidata will continue to report on the GDP and population of different countries, at least for the next 5-10 years. Setting up predictions on such variables should be very feasible.
There could be useful foundational non-predictive work to help future predictions.
One could imagine many useful projects and organizations that focus on just doing a good job on the foundational work, with the goal of assisting predictions down the line. For example, an organization could be set up just to evaluate important future variables. While this organization wouldn’t do forecasting itself, it would be very easy for other forecasting efforts to amplify this organization by forecasting its future evaluations. Currently, this is one accidental benefit of some organizations, but if it were intentional then evaluations could be better optimized for prediction support.
Possible Pyramid Modifications
The above pyramid was selected to be a simple demonstration to explain the above implications. In data science, several different pyramids have been made for different circumstances. Similarly, we can imagine multiple variations of this pyramid for other use cases.
“Aggregations” may make sense on top of predictions. It could be possible for some sites to list predictions and others to aggregate them. There are already sites exist to do nothing except for aggregation. Predictwise is one example.
The foundational understanding layer in the bottom could be subdivided into many other categories. For instance, research distillation could be a valid layer.
Acknowledgements
Thanks to Jacob Lagerros for contributing many examples and details to this post, and to Ben Goldhaber and Max Daniel for providing feedback on it.
[1] One of the first markets on prediction market Augur had this exact problem, with no mention of how a sports market would resolve if the game rained out, disputed, postponed, tied, etc. (Zvi discusses this issue further in his post on prediction markets.)
A lot of Quantified Self tools provide data. It would be great to have a tool that pulls data from QS sources like the Oura API, Beeminder, Fitbit, Google Fit and Apple Healthvault and allows me to make prediction based on the data.
It would be really great if I could make predictions about how last night went before looking at my Oura stats.
I think I’ll be quite doable, but take some infrastructural work of course. Could you be a bit more specific about the predictions you want to make?
One way would be to start every morning by giving a 80% confidence interval for the amount of total sleep, the amount of deep sleep, average nighttime HRV and lowest heart rate in the last night.
For training calibration I would expect it to be very useful to make live-prediction on variables that your own gadgets can measure.
I would only enjoy making predictions about the future. Confidence intervals about my weight in 30 days and various other health metrics would be interesting as well.
Interesting, that makes sense, thanks for the examples.
Being an admin on Wikidata myself, I don’t see Wikidata as solution to this problem given that Wikidata can report values from multiple different sources.
If you want to make predicions about GDP and population the CIA Worldfactbook is likely a more stable source.
Good point, thanks. Even though we data could have values multiple sources, that could still be more useful than nothing, but it’s probably better where possible to use specific sources like the CIA World Factbook, if you trust that they will have the information the future.
While ontologies have been developed for diseases, that doesn’t mean that the ontologies are good. The DSM-V for example was crappy enough that the head of the NIH called for people to develop alternative ways of measuring it after it was published.
Apart from the actual definition, if you have a question like whether the amount of people with Asperger’s did increase, you only have data about the number of people who are diagnosed with Asbergers and not about how many people would qualify for the diagnosis.
The easiest way to improve cancer survival rates is to diagnose more people with cancer and that’s the strategy which the US used to get the best cancer survival rates in the world. Only quite recently under the Obama administration, the US government came to the point that it makes sense to reduce the number of people who get diagnosed with cancer.
I didn’t mean that all analogies around diseases were good, especially around psychology. The challenges are quite hard; even in cases where a lot of good work goes into making ontologies, there could be tons of edge cases in similar.
That said, I think medicine is one of the best examples of the importance and occasional effectiveness of large ontologies. If one is doing independent work in the field, I would imagine it is rare that they would be best served by doing their own ontology development, given how much is been done so far.
The DSM-V is an ontology that’s cause neutral. That’s useful when you have many professionals who need a common language but which disagree about the causes of mental illnesses.
A depression that’s due to a head trauma is different from a depression because someone doesn’t deal well with a romantic relationship that’s ended.
When you want to actually treat depression it’s on the other hand very useful to be able to keep different types of depressions apart.
″ That’s useful when you have many professionals who need a common language but which disagree about the causes of mental illnesses. ”
Using the proposed framework, it means that the field lacks Foundational Understanding. Thus I wouldn’t feel comfortable calling the DSM an ontology, though there is i.e. the Mental Disease Ontology, which sometimes maps to DSM.
I do approve of the Mental Disease Ontology as a project of fellow bioinformaticians to replace the mess of the psychiatrists but to me the Mental Disease Ontology doesn’t look like it’s a solution.
When it comes to cancerophobia and AIDS phobia both of those terms seem to me like they point to symptoms and there might be different underlying mechanisms in different patients.
It’s also not clear to me why impulse control disorders are supposed to be no cognitive disorders.
Just a quick 2 cents: I think it’s possible to have really poor or not-useful ontologies. One could easily make a decent ontology of Greek Gods, for instance. However, if their Foundational Understanding wasn’t great, then that ontology won’t be that useful (as if they believed in Greek Gods).
In this case, I would classify the DSM as mainly a taxonomy (a kind of ontology), but I think many people would agree it could be improved. Much of this improvement would hopefully come through what is here called Foundational Understanding.
This post is so good! I was just thinking if this framework could be useful for prediction business, where the Foundational Understanding is crowd-sourced through e.g. academic literature, open data, manual curator. Ontologies might be created and curated by public consortia and evaluation could be a private-public endeavour.
Thanks! Good point about the division. I agree that the different parts to be done by different groups, I’m not sure with the best way of doing each one is. My guess is that some experts should be incorporated into the foundational understanding process, but that they would want to use many other tools (like the ones you mention). I would imagine all could be either done in the private or public sector.
Thinking of stocks, I find it hard to articulate how this pyramid might correspond to predicting market value of a company. To give it a try:
Traders predict the value of a stock.
The stock is evaluated at all times by the market buy\sell prices. But that is self referential and does not encompass “real” data. The value of a stock is “really evaluated” when a company distributes dividends, goes bankrupt, or anything that collapses a stock to actual money.
The ontology is the methods by which stocks get actual money.
Foundational understanding is the economic theory involved.
[After writing this down, this feels more natural to need a pyramid in this case also (even though I initially guessed that I would find the lower layers unnecessary), or more precisely—it is very useful to think about this pyramid to see how we can improve the system.]