Forecasting is Way Overrated, and We Should Stop Funding It
Summary
EA and rationalists got enamoured with forecasting and prediction markets and made them part of the culture, but this hasn’t proven very useful, yet it continues to receive substantial EA funding. We should cut it off.
My Experience with Forecasting
For a while, I was the number one forecaster on Manifold. This lasted for about a year until I stopped just over 2 years ago. To this day, despite quitting, I’m still #8 on the platform. Additionally, I have done well on real-money prediction markets (Polymarket), earning mid-5 figures and winning a few AI bets. I say this to suggest that I would gain status from forecasting being seen as useful, but I think, to the contrary, that the EA community should stop funding it.
I’ve written a few comments throughout the years that I didn’t think forecasting was worth funding. You can see some of these here and here. Finally, I have gotten around to making this full post.
Solution Seeking a Problem
When talking about forecasting, people often ask questions like “How can we leverage forecasting into better decisions?” This is the wrong way to go about solving problems. You solve problems by starting with the problem, and then you see which tools are useful for solving it.
The way people talk about forecasting is very similar to how people talk about cryptocurrency/blockchain. People have a tool they want to use, whether that be cryptocurrency or forecasting, and then try to solve problems with it because they really believe in the solution, but I think this is misguided. You have to start with the problem you are trying to solve, not the solution you want to apply. A lot of work has been put into building up forecasting, making platforms, hosting tournaments, etc., on the assumption that it was instrumentally useful, but this is pretty dangerous to continue without concrete gains.
We’ve Funded Enough Forecasting that We Should See Tangible Gains
It’s not the case that forecasting/prediction markets are merely in their infancy. A lot of money has gone into forecasting. On the EA side of things, it’s near $100M. If I convince you later on in this post that forecasting hasn’t given any fruitful results, it should be noted that this isn’t for lack of trying/spending.
The Forecasting Research Institute received grants in the 10s of millions of dollars. Metaculus continues to receive millions of dollars per year to maintain a forecasting platform and conduct some forecasting tournaments. The Good Judgment Project and the Swift Centre have received millions of dollars for doing research and studies on forecasting and teaching others about forecasting. Sage has received millions of dollars to develop forecasting tools. Many others, like Manifold, have also been given millions by the EA community in grants/investments at high valuations, diverting money away from other EA causes. We have grants for organizations that develop tooling, even entire programming languages like Squiggle, for forecasting.
On the for-profit side of things, the money gets even bigger. Kalshi and Polymarket have each raised billions of dollars, and other forecasting platforms have also raised 10s of millions of dollars.
Prediction markets have also taken off. Kalshi and Polymarket are both showing ATH/growth in month-over-month volume. Both of them have monthly volumes in the 10s of billions of dollars. Total prediction market volume is something like $500B/year, but it just isn’t very useful. We get to know the odds on every basketball game player prop, and if BTC is going to go up or down in the next 5 minutes. While some people suggest that these trivial markets help sharpen skills or identify good forecasters, I don’t think there is any evidence of this, and it is more wishful thinking.
If forecasting were really working well and was very useful, you would see the bulk of the money spent not on forecasting platforms but directly on forecasting teams or subsidizing markets on important questions. We have seen very little of this, and instead, we have seen the money go to platforms, tooling, and the like. We already had a few forecasting platforms, the market was going to fund them itself, and yet we continue to create them.
There has also been an incredible amount of (wasted) time by the EA/rationality community that has been spent on forecasting. Lots of people have been employed full-time doing forecasting or adjacent work, but perhaps even larger is the amount of part-time hours that have gone into forecasting on Manifold, among other things. I would estimate that thousands of person-years have gone into this activity.
Hits-based Giving Means Stopping the Bets that Don’t Pay Off
You may be tempted to justify forecasting on the grounds of hits-based giving. That is to say, it made sense to try a few grants into forecasting because the payoff could have been massive. But if it was based on hits-based giving, then that implies we should be looking for big payoffs, and that we have to stop funding it if it doesn’t.
I want to propose my leading theory for why forecasting continues to receive 10s of millions per year in funding. That is, it has become a feature of EA/rationalist culture. Similar to how EAs seem to live in group houses or be polyamorous, forecasting on prediction markets has become a part of the culture that doesn’t have much to do with impact. This is separate from parts of EA culture that we do for impact/value alignment reasons, like being vegan, donating 10%+ of income, writing on forums, or going to conferences. I submit that forecasting is in the former category.
At this point, if forecasting were useful, you would expect to see tangible results. I can point to you hundreds of millions of chickens that lay eggs that are out of cages, and I can point to you observable families that are no longer living in poverty. I can show you pieces of legislation that have passed or almost passed on AI. I can show you AMF successes with about 200k lives saved and far lower levels of malaria, not to mention higher incomes and longer life expectancies, and people living longer lives that otherwise wouldn’t be because of our actions. I can go at the individual level, and I can, more importantly, go at the broad statistical level. I don’t think there is very much in the way of “this forecasting happened, and now we have made demonstrably better decisions regarding this terminal goal that we care about”. Despite no tangible results, people continue to have the dream that forecasting will inform better decision-making or lead to better policies. I just don’t see any proof of this happening.
Feels Useful When It Isn’t
Forecasting is a very insidious trap because it makes you think you are being productive when you aren’t. I like to play bughouse and a bunch of different board games. But when I play these games, I don’t claim to do so for impact reasons, on effective altruist grounds. If I spend time learning strategy for these board games, I don’t pretend that this is somehow making the world better off. Forecasting is a dangerous activity, particularly because it is a fun, game-like activity that is nearly perfectly designed to be very attractive to EA/rationalist types because you get to be right when others are wrong, bet on your beliefs, and partake in the cultural practice. It is almost engineered to be a time waster for these groups because it provides the illusion that you are improving the world’s epistemics when, in reality, it’s mainly just a game, and it’s fun. You get to feel that you are improving the world’s epistemics and that therefore there must be some flow-through effects and thus you can justify the time spent by correcting a market from 57% to 53% on some AI forecasting question or some question about if the market you are trading on will have an even/odd number of traders or if someone will get a girlfriend by the end of the year.
Conclusion
A lot of people still like the idea of doing forecasting. If it becomes an optional, benign activity of the EA community, then it can continue to exist, but it should not continue to be a major target for philanthropic dollars. We are always in triage, and forecasting just isn’t making the cut. I’m worried that we will continue to pour community resources into forecasting, and it will continue to be thought of in vague terms as improving or informing decisions, when I’m skeptical that this is the case.
During the 2016 election, many prominent media outlets presented odds of ~99% for a Hillary Clinton victory, and then defended such claims after the election by saying things like “well, 1-in-a-100 events do still happen.” In my experience, people have mostly stopped doing this and present more reasonable figures, and I attribute this largely to the rise of prediction markets, which many news outlets have started citing directly.
They have clearly “raised the sanity waterline,” one of the famous goals of at least the rationalists, if not EA.
In October, PredictIt and PredictWise had Clinton at 83 and 91 cents respectively.
The night before the 2016 election, WCNC published the next-day forecasts by the NY Times, 538 (when Nate Silver was still running it) and the Huffington Post, which gave odds of Clinton winning of 84%, 68%, and 98% respectively.
The Princeton Election Consortium‘s model developed by data scientist Sam Wang forecasted 99% for Clinton. Wang ate a cricket on live TV.
Reuters/Ipsos forecasted 90% Clinton.
Overall, seems like prediction markets were in the same ballbark of wrongness as the media forecasts. Admittedly, the forecast dates here are not identical—a more rigorous breakdown would be welcome.
But in this bunch, the best result was obtained by a professional data scientist, Nate Silver, not by a prediction market.
There’s been a huge amount of discourse around the failed 2016 forecasts, and they all attribute it to failure to take correlated polling errors into account (which Silver did model, explaining his less wrong prediction). There might have been underlying partisan bias or conformity warping modeling decisions, but those biases also exist in prediction markets. Money and reputation is on the line for media outlets publishing high profile quantitative forecasts.
So overall, I don’t think the 2016 presidential election forecast is a great example of PMs raising the sanity waterline.
Markets can be better or worse depending on eg liquidity. My guess would be that today’s markets are better. (The large difference between 83 and 91 cents failing to disappear from arbitrage is an indication that at least one of those markets weren’t so great, though I haven’t checked how current markets look on that metric.)
Either way, they were in the same ballpark as the other forecasts. Even if both were poor markets, that only strengthens the argument that they shouldn’t receive credit for “raising the sanity waterline” around the 2016 election.
Shankar’s original claim was that the 2016 election was BEFORE functional prediction markets, and that the bit of “raising the sanity waterline” in question happened between then and today.
I really don’t think PredictIt should count as a prediction market at all in this context, I recall that they had crazy rules that made it basically impossible for serious people to make serious money by correcting even blindingly obvious market errors. (Don’t know anything about PredictWise.)
Yes, at the time the limit at Predictit was $850 per user per market. When the CFTC originally issued its no-action letter to Predictit, it was on the basis that it was for research purposes.
But this argument relies on an alternative history, like “functional modern high-volume prediction markets would have called the election better than Nate Silver, had they existed at that time.” But that’s an (implicit) assertion without evidence. The confident calls for Clinton weren’t by bloviating pundits making obviously wrong calls, they were by professional data-driven forecasters with methodological disagreements and oversights.
If people want to argue that prediction markets have “raised the sanity waterline” by preventing disastrously bad forecasts about high stakes events, they need to point to examples of where functional modern prediction markets have directly competed with i.e. boutique models published by media outlets. Define the comparison!
Now, that said, I can put forth a different reason they’ve raised the sanity waterline in specific areas. Taking hotbutton issues, breaking them down into empirically verifiable factors, and getting a money-backed continuously updating estimate on those factors is genuinely helpful when I’m specifically interested in a relevant issue. Even if I think the estimate is wildly wrong, it’s useful to see where the consensus lies. But I’ve read extensively on them, participated in them, and have a pretty deep knowledge of their mechanics, limitations, etc.
But I also think there are some underappreciated dangers in prediction markets. Of course, we’re now seeing how they are being used for insider trading, providing a mechanism for information leaks, and are genuinely being used as something approaching an assassination market. They’re motivating harassment of journalists whose articles are being used to resolve the markets. That’s directly counterproductive to raising the sanity waterline.
They also take some real sophistication to interpret—particularly understanding how to parse the volume, individual bet sizes (whale activity), participant selection effects, and resolution criteria. They’re optimized for betting, not informing the public, and because of that, there’s a real risk that if you go to them for informational rather than gambling purposes, you’ll misinterpret the number and become confidently wrong. I don’t know what a study on the actual impact of prediction markets on public epistemics would measure, though it would be interesting. But to me, the jury is still out, and it really doesn’t make sense to say they’ve “obviously raised the sanity waterline.”
Can you elaborate on this please?
In general, there’s all kinds of insider trading apparently going on inside the Trump admin in prediction markets whose outcome is heavily determined by US gov actions.
We also have of course the case of Emanuel Fabian, who’s been receiving death threats for accurately reporting on a missile attack in ways that unfavorable to one faction of polymarket gamblers.
There’s an investigation into whether an airport thermometer used to resolve a polymarket bet waas tampered with.
Obviously none of these are literal assassination markets, which is why I said “something approaching” rather than “precisely an assassination market.” The general principle is people are taking destructive actions (tampering, indirectly leaking classified information through their betting behavior, threatening journalists) or are betting on while being directly involved in violent actions (the soldier) that are attached to prediction markets. And on a deeper level, we have to ask whether the fraud opportunities that prediction markets present are then going to become a systemic generator of fraud.
Big picture, the combination of prediction markets, crypto, and gambling becomes an unregulatable, permanent fraud coordination mechanism. It incentivizes a set of bad social roles:
People who can come up with situations that sound outlandish to one segment of the population, who like to gamble and who believe there’s another segment of the population that’s just dumb.
People who realize that they’re in a position to influence that outlandish situation into being. They’d never have done so if there wasn’t an easy opportunity to make money on it, but there is, so they do.
What these markets point toward is a future in which the world just becomes more chaotic, because for any outlandish situation you might consider and create a market for, you’ve now incentivized people to make that situation happen (or not happen), just because you asked the question and got people’s attention. They become general bounty markets. Bet on “no” enough, and you incentivize somebody to bet against you and make “yes” happen. If you can believe in an AI singularity, you ought to be able to believe in this possibility as well!
And the deeper issue here is that both the people attempting to manifest “no” and “yes” are now spending their effort not on some sort of economically productive task, but on forcing prediction market outcomes to happen.
If the influence they can bring to bear on the situation is substantial, and the outcome is important, then it incentivizes people investing huge amounts of money into these markets in order to motivate sufficient effort toward the preferred outcome. There are of course serious free rider problems here. If an outcome is extremely important to a special interest group, then they may be willing to fund prediction markets-based bounties by betting heavily on the opposite outcome from the one they want. This creates a new mechanism for plausible deniability, which is how bribes operate these days—no quid pro quo, just an implied mutual understanding of game theory.
Insider training seems to me like a different category than assassination markets. The concept of assassination markets seem to me to suggest that someone takes harmful real world actions to make a prediction happen.
A short seller releasing a dossier about fraud in a company for which he holds shorts looks to me more “assassination market” like than a soldier who has no choice about whether or not to capture Maduro because he has to follow the chain of command making an insider trade.
This is a really compelling point.
I dont think its true that the news media is now more rational than it used to be. Outlandish nonsense is still said all the time. Its also not clear to me that it would matter that much, even if it were true
Why do you attribute this largely to the rise of prediction markets? My perception is that news outlets started citing prediction markets roughly when they became an effective vehicle for hard-to-regulate sports gambling, I don’t think this has ~anything to do with the 2016 election, and indeed, directly following that election, significantly prior to the rise in cultural salience of prediction markets, data scientists and pollsters were in crisis for several months trying to figure out why the polls were so wrong. They did a much better job in the 2018 midterms and this is certainly not attributable to prediction markets, rather to directly addressing the methodological gaps in polling that had recently become salient.
are you sure this isn’t just the evolution of your own information diet and circle of friends? do you think if you asked a random american “do you know who nate silver is? → do you think he got it mostly right or mostly wrong in the last few elections?” do you think they’d say “he was mostly right” or “nate silver is always wrong because he’s [too woke]/[not woke enough]”?
prediction markets are allegedly a way to bring empiricism to fields that had none before, and your best defense of them is “the vibes feel better now”
FWIW, I refer to Manifold Markets and prediction-markets every week in my decision-making. My guess is this mundane utility generalizes. I am kind of confused why you think people don’t use these for decision-making, they seem really useful in lots of circumstances.
Some random example markets I referred to recently:
That weird futures market on the Anthropic IPO price (can’t find the link but saw it reference on Twitter a bunch)
This market on Anthropic making another big revision to their RSP
This market of mine on Anthropic security commitments
Basically all of the election betting odds markets
Markets on nuclear war
What decisions do you think this has affected and what would you estimate the differences in outcomes to be as a result? Or, say, the most important impacts?
You may be thinking of Ventuals. Best wishes, Less Wrong Reference Desk
Basically just +1 on what Michael said. How are you using markets on nuclear war in your decision making? Very concretely, can you name a decision you made differently due to these markets?
Yes, I used them to set a threshold for evacuation protocols at Lighthaven, together with decisions on emergency supplies, how many bugout bags to have, etc.
(I had also built a small website called “hasRussiaLaunchedNukesYet.com″ which would send everyone who signed up a text message if the probability of a nuke being launched was above 90% according to the markets, which would then be a natural time to get out and escape)
FWIW, this does not change my mind on my OP, though this is interesting.
What scope do you have in mind when you refer to forecasting? Is it specifically Tetlockian forecasting / prediction market style forecasting where most of the value is a forecasted number answering a well-defined question, and the methdology often involves aggregating a bunch of people’s views, each who didn’t spend much time?
If so, then I agree directionally and in particular agree the current track record isn’t great, though I think this sort of forecasting will be plausibly quite useful for AI stuff as we get more close to AGI/ASI, and thus it may be easier to operationalize important questions that don’t require long chains of conceptual thinking, there will be lots of important sub-questions to cover, some of which may be more answerable by superforecaster-like techniques as we have better trends / base rates to extrapolate since we are closer to the events we care about. And also having a bunch of AI labor might help.
But overall I am at least currently much more excited about stuff like AI 2027 or OP worldview investigations than Tetlockian forecasting, i.e. I’m excited about work involving deep thinking and for which the primary value doesn’t come from specific quantitative predictions but instead things like introducing new frameworks (which is why I switched what I was working on). I’m not sure if AI 2027 or OP worldview investigations work is meant to be included in your post.
I am mostly talking about Tetlockian forecasting. I am talking about other versions of it too, though, including AI 2027.
I didn’t want to argue against AI 2027 type stuff in this post but on net, I think AI 2027 made some very aggressive predictions, that will turn out to be wrong (even if you give double the time for them to occur) and I think that AI safety people will end up looking silly, like the boy who cried wolf.
For two concrete examples:
“By early 2030, the robot economy has filled up the old SEZs, the new SEZs, and large parts of the ocean. The only place left to go is the human-controlled areas.”. This one is easy to operationalize. I would bet that by the end of 2032, less than 20% of the current Earth’s oceans will be taken over by the “robot economy”.
“June 2027: Most of the humans at OpenBrain can’t usefully contribute anymore.”
Yup, I’m also quite worried about this. I’m very uncertain though about the magnitude of the issue.
e.g. if most humans at OpenBrain not contributing happens in 2030 (so taking a bit more than 2x longer to happen than predicted), I’d guess that many people will not discredit us / safety people because of AI 2027 and may still give some credit.
Certainly not all people! But I’ve been pleasantly surprised by the discourse thus far on evaluating AI 2027, which (as far as I remember, might be wrong) has often focused on feeling like reality is unfolding in a way that is directionally toward AI 2027 compared to what the person previously thought, or whether AI 2027 is closer to reality than what the person had thought. And many people seemed to understand that it was not a confident prediction of any specific timeline. (I guess there was a blow up about Daniel updating his timelines later / having a median longer than AI 2027, but I’m talking about the reactions relating to how reality has compared to the scenario)
(edit: You might worry that the reception has been good so far only because reality actually has looked pretty similar to the scenario, and that will change soon. That seems very reasonable. Also, to be clear, even if the crying wolf effect is large, I think there will remain large positive effects, especially if the takeoff looks recognizable relative to the takeoff in AI 2027 in terms of the overall dynamics even if it substantially later or slower.)
I’m also less than 50% on this, maybe ~33%? You can generally see my views at https://www.aifuturesmodel.com/forecast/eli-04-02-26, they’re somewhat less aggressive than Daniel’s. (Obviously I can’t fully speak for Daniel but I think his response to your comment would be further in the direction of sticking by AI 2027′s predictions are likely to be close to right.)
Yeah, I just don’t agree that reality has played out like AI 2027 in any meaningful way that isn’t very obvious. It’s too early to say. Basically, no meaningful predictions are made until the end of 2026, so we are too early to say. It’s just too early to claim victory.
I have been meaning to write up my critiques of AI 2027 but I have too many of these kinds of posts to write up and I’m a slow writer.
Makes sense. For what it’s worth, we’ve had people tell us and seen people post on Twitter that they’ve taken scenarios like AI 2027 more seriously because so far reality has played out more like AI 2027 than they thought it would.
[Relevant context/COI: I’m CEO at the Forecasting Research Institute (FRI), an organization which I co-founded with Phil Tetlock and others. Much of the below is my personal perspective, though it is informed by my work. I don’t speak for others on my team. I’m sharing an initial reply now, and our team at FRI will share a larger post in future that offers a more comprehensive reflection on these topics.]
Thanks for the post — I think it’s important to critically question the value of funds going to forecasting, and this post offers a good opportunity for reflection and discussion.
In brief, I share many of your concerns about forecasting and related research, but I’m also more positive on both its impact so far and its future expected impact.
A summary of some key points:
Much of the impact of forecasting research on specific decision-makers is not public. For example, FRI has informed decisions on frontier AI companies’ capability scaling policies, has advised senior US national security decision-makers, and has informed research at key US and UK government agencies. But, we are not able to share many details of this work publicly. However, there is also public evidence that forecasting research is widely cited and informs discourse and some decision-making (some examples below).
AI timelines, adoption, and risk forecasts play a huge role in both individual career decisions and the broader AI discourse. Forecasting research still seems like one of the best tools available for getting specific and accountable beliefs on these topics. For example, comparing ‘AI safety’ community forecasts to more ‘typical’ experts’ forecasts seems especially important for understanding how much to trust each group’s views. These comparisons will become increasingly relevant for government policymakers over time, especially if there is extremely rapid AI capabilities progress that leads to major societal impacts in the short-run.
When evaluating the impact of FRI-style forecasting research, I think the closest relevant comparison classes are more like broad public goods/measurement-oriented research (e.g., Our World in Data, Epoch) or think-tank research (e.g. GovAI, IAPS). By its nature, the impact of this kind of research tends to be more diffuse and difficult to measure. However, I’d be interested in more intensive comparative evaluation of this type of research and agree that funders should be responsive to evidence about relative impact in these fields.
Forecasting research still has a ton of flaws, and its impact has been far from the dream I’ve long had for it. There are still big challenges around identifying accurate forecasters on questions related to AI, integrating conditional policy forecasts with actual decision-makers’ needs, and combining deep, individual qualitative research with high-quality, group-generated quantitative forecasts.
My extremely simplified narrative is: Tetlock et al. established the modern judgmental forecasting field and created a proof of concept for better forecasts on important topics (“superforecasting”)---this work was largely academic; some forecasting platforms were created to build on that work and apply it to a range of important issues; targeted efforts to make forecasting more directly useful to decision-makers are relatively nascent (i.e., have largely begun in the last few years), and are accumulating impact over time, but still have room for improvement.
FRI’s research, in particular, aims to close many of the gaps left by prediction markets and historical forecasting approaches: it is particularly focused on conditional policy forecasts, medium-to-long-run forecasts that do not get much detailed engagement on prediction markets/platforms, and systematically eliciting forecasts from experts who would not typically participate in forecasting platforms but whom decision-makers want to rely on (while also eliciting forecasts from generalists with strong forecasting track records).
However, some factors make the future potential impact of this work look more promising:
AI-enhanced forecasting research is a huge factor that will unlock cheaper, faster, high-quality forecasts on any question of one’s choosing.
The next few years of forecasting AI progress/adoption/impact seem critical, and like they’ll deliver a lot of answers on whose forecasts we should trust. It seems good to be ready to support decision-makers during this time.
Leaders in the AI space seem particularly interested in using forecasting in their decision-making; they tend to be both quantitative and open-minded. This creates more potential for forecasting to be useful. More minorly, prediction markets and forecasting are generally becoming more credible within governments.
More detail on some select points below. This comment already got very long (!), so I’ll reserve more elaboration for a future, more comprehensive post.
Examples of impact
Forecasting research has informed some very important decisions. Unfortunately, many of the details of the relevant evidence here cannot be made public. However, there is evidence of substantial public citation of this research, and some public evidence of affecting particular decisions.
A few examples of relevant impact include:
Forecasting has been particularly relevant for decision-making around capability scaling policies. The near-term magnitude of AI-biorisk, how growing AI capabilities may increase it, and what safeguards need to be in place to respond to it, are highly uncertain. Frontier AI companies, the EU AI Code of Practice, and other governments are trying to track and respond to AI impacts on biorisk, cybersecurity, AI R&D, and other domains. We’ve had substantial engagement with the relevant actors, including some focused partnerships, and believe our work in this area has affected important decisions, though we unfortunately cannot share many of the details publicly.
Our work on ForecastBench, a benchmark of AI’s ability to do forecasting, showed that AI-produced forecasts could catch up to top human forecasters in roughly the next year if trends persist. This generated interest among senior decision-makers in U.S. national security. We cannot share details, but this is another example of important decision-makers paying attention to and using forecasts.
We have completed commissioned research to directly inform grantmaking at Coefficient Giving, and also have indirectly affected grantmaking. For an example of the latter, our work on the Existential Risk Persuasion Tournament (XPT) partially inspired Coefficient Giving (formerly Open Philanthropy) to launch an RFP on improved AI benchmarks. The XPT forecasts predicted that most existing benchmarks would likely saturate in the next few years, and showed that progress on these benchmarks was not crux-y for disagreements about AI impact. We were told that this played a role in the launch and conception of the RFP, and the XPT is cited in the public write-up.
Some examples of more diffuse impacts — e.g., impact on public understanding of AI and research for policymakers or philanthropists, include:
FRI has given presentations to, and has ongoing connections and conversations with, important government agencies such as the Congressional Budget Office, US CAISI, the UK Department of Science, Innovation, and Technology, and others. We cannot share many details, but the potential to inform decisions at these organizations is highly important.
Major reports for policymakers, like the International AI Safety Report, the AI Index, and relevant RAND reports, also prominently cite FRI research.
FRI research is cited in places like the New York Times, The Economist, and Bloomberg to inform readers about the economic impacts of AI, AI-biorisk, general catastrophic and existential risk, AI-enhanced forecasting, and the future of AI more generally.
Forecasts are widely cited in cause prioritization research and by experts in relevant domains: as a few examples, see citations from Ethan Mollick on AI progress, 80,000 Hours on biorisk, Dr. Richard Moulange on AI-biorisk, Tyler Cowen on the economic effects of AI, Will MacAskill on AI progress and risk, etc.
For context: FRI has been operating for a little over 3 years, and we’re accumulating substantially more momentum in terms of connections to top decision-makers as time goes on.
(To be clear: I am mostly discussing FRI here since it’s what I’m most familiar with.)
AI timelines, impact, and adoption forecasts drive a huge amount of career decision-making, attention, etc.
Forecasts about AI timelines and risk have had major effects on people’s career decisions and the broader AI discourse. AI 2027 underlies popular YouTube videos, 80,000 Hours advises people on career decisions based on timelines forecasts, Dario Amodei’s “country of geniuses in a datacenter by 2027” forecast informs a lot of Anthropic’s work and policy outreach, the AI Impacts survey on AI researchers’ forecasts of existential risk is highly cited, etc.
A major reason I got into this field is that many people are making very intense claims about the effect that AI will have on the world soon, and I want to bring as much rigor and reflection as possible to those claims. So far, it looks like most forecasters are substantially underestimating AI capabilities progress (with some exceptions, e.g. on uplift studies); the evidence on forecasts about AI adoption, societal impacts, and risk is less clear, but I expect we will have more evidence soon, particularly from the Longitudinal Expert AI Panel (LEAP), especially as some forecasters are predicting transformative change in the next few years.
As the expected impact and timing of AI progress is sharpened and clarified, talent and money can be allocated more efficiently.
Case study: Economic impacts of AI
In some cases, it looks to me like forecasting research is picking relatively low-hanging fruit.
The economic impact of AI is a prominent topic of public discussion right now, and it is likely that governments will spend many billions of dollars to address it in the coming years.
Currently, economists hold major sway in public policy about the economic impacts of AI. Perhaps you think top economists, as a group, are badly mistaken about the likely near-term impacts of AI, as some Epoch researchers and others believe. Perhaps you think they are likely to be fairly accurate, as Tyler Cowen, Séb Krier, or typical economists believe. It seems like a valuable common sense intervention to at least document what various groups believe, so that when we are making economic policy going forward we can rely on that evidence to determine who is trustworthy. I believe that studies like this one (and its follow-ups) will be the clearest evidence on the topic.
Relevant comparison class for forecasting research
When thinking about the impact and cost-effectiveness of forecasting, I think it’s more appropriate to compare this work to public goods-oriented research organizations (e.g., Our World in Data, Epoch, etc.) and policy-oriented think-tank research (e.g. GovAI, IAPS, CSET, etc.).
I’ve been disappointed by most impact evaluation of think-tanks and public goods-oriented research that I’ve seen. I believe this is partly because it is very difficult to quantify the impact of this type of work because it has diffuse benefits. But, I still think it’s possible to do better and I would like FRI to do better on this front going forward.
That said, I still believe there are reasonable heuristics for why this research area could be highly cost-effective. There are many billions of dollars of philanthropic and government capital being spent on AI policy topics. If there is a meaningful indication that forecasting is changing people’s views on these questions (as I believe there is; see discussion above), it seems reasonable to me to spend a very small fraction of that capital on getting more epistemic clarity.
My critiques of forecasting research
Forecasting research, and FRI’s research in particular, still has major areas for improvement.
Examples of a few key issues:
I’ve been underwhelmed by the accuracy of typical experts and superforecasters on questions about AI capabilities progress (as measured by benchmarks); they often underestimate AI progress (with exceptions). I think this underestimation is a useful fact to document, but it would be much more helpful if our research identified experts you should trust. We’re in the process of identifying ‘Top AI forecasters’ through LEAP and aim to share updates on this soon.
I think forecasting research is at its best when combined with in-depth research reports that provide more narratives and key arguments underlying forecasts. For example, Luca Righetti’s work on estimating (certain kinds of) AI-biorisk provides a lot of valuable analysis that usefully complements our expert panel study on the topic. [Note: Luca is an FRI senior advisor and a co-author of our forecasting study.] For decision-makers to build sufficiently detailed models, and for forecasters to test their arguments, we’d ideally have detailed research like Luca’s on most major topics where we collect forecasts — ideally from a few experts who disagree with each other. Unfortunately, this research often doesn’t readily exist, but we are investigating ways to generate it.
I have been somewhat surprised by how few experts in AI industry, AI policy, and other domains predict transformative impacts of AI similar to what are commonly discussed by AI lab leaders, people in the AI safety community, and others. This has made it harder to have a true horse-race between the ‘transformative AI’ school of thought that seems to drive a lot of discourse and decision-making vs. more gradual views of AI impacts. Though we have some transformative AI forecasters in our studies, in future work we aim to explicitly collect more forecasts from the ‘transformative AI’ school of thought in order to set up clearer comparisons between worldviews and to better anticipate what will happen if the ‘transformative AI’ school makes more accurate forecasts.
I will save other thoughts on how forecasting, and FRI’s research, could be made more useful to decision-makers for a future post.
But, to be clear: I have a lot of genuine uncertainty about whether forecasting research will be sufficiently impactful going forward. There are promising signs, and increasing momentum, but to more fully deliver on its promise, more improvements will be necessary.
Some notes on FRI-style forecasting research vs. other forecasting interventions
On the value of FRI-style forecasting research in particular:
Prediction markets do not have good ways to collect causal policy forecasts, but in our experience, conditional policy forecasts (e.g., how much would various safeguards reduce AI-cyber risk) are often the most helpful forecasts for decision-makers.
Similarly, prediction markets do not create good incentives for longer run forecasts or low-probability forecasts, and incentivize against sharing the rationales behind forecasts. Directly paying and incentivizing relevant experts and forecasters to answer questions is often more useful.
Typical forecasting platforms do not get forecasts from the kinds of experts that policymakers typically rely on, and aren’t the kind of evidence that can easily be cited in government reports. (This may be unfortunate, but it is the current state of the world.)
Reasons for optimism about future impact
Finally, there are a few factors that have the potential to dramatically change the field going forward:
It looks like AI may soon make it >100x cheaper and faster to get high-quality forecasts on any topic of one’s choosing. Policy researchers will be able to ask the precise question they’re interested in, will be able to upload confidential documents to inform forecasts (something we’ve heard is especially important to decision-makers), and will be able to get detailed explanations for all forecasts. AI-produced forecasts will also be much easier to test for accuracy due to the volume of forecasts they can provide, and it will be easier to generate ‘crux’ questions since AI will not get bored of producing huge numbers of conditional forecasts (which are necessary for identifying cruxes). Building benchmarks and tooling to harness AI-produced forecasts will be a much larger part of our work going forward.
The next few years seem very unusual in human history: very thoughtful researchers are predicting “Superhuman Coders” by 2029, with attendant large impacts. There is a spectrum of views, but the scope for disagreement among reasonable people about what the world will look like in 2030 is huge. This is a particularly important time to make predictions testable, update on what we observe, and make better policy and personal decisions on the basis of this information.
People working in the AI space seem particularly interested in using forecasting, perhaps due to a mix of being quantitatively oriented and because they’re facing unusual degrees of uncertainty. This bodes well for forecasting being useful in the coming years. More minorly, it appears that there is a broader cultural change around forecasting-related topics. Prediction markets are increasingly being cited by government officials, and the public is paying more attention to them than ever before. Much of the impact for prediction markets specifically seems negative (e.g. via incentivizing gambling on low-value topics), but the broader cultural shift suggests there may be an opportunity for better uses of forecasting to enter public consciousness as well.
I’m not sure the norms here but I will just copy over my reply from the EA forum.
Hi Josh, thanks for the response.
I hate to do this, especially at the start, but I want to point out for you and others who have jobs related to forecasting that it’s difficult to convince someone of something when their job relies on them not believing it. I think you should assume that you will think forecasting is more useful than it is.
As for your points, I’ll respond to some of them.
If you want to DM me, I can sign an NDA, and I may update my opinion depending on what these non-public uses of forecasting are.
I don’t think this is all that relevant. I’m not sure what forecasting research has really elicited on AI timelines. I agree that talk about timelines creates a lot of “buzz” around AI but depending on your views, this is good or bad.
I agree that the impact of measurement-oriented research is difficult to measure, but importantly, not impossible. OWID for example should count how much their work is being cited and looked up. Conversely, I think it would be good to estimate, for FRI, how much $$ the change of the decision was worth and by what amount/percentage did FRI make that change more likely. I don’t think you really gave a good reason that FRI should be funded over anything else that simply has very diffuse benefits.
When do you think it’s reasonable, if ever, for the EA community to “give up” on funding more forecasting work?
If I’m being cynical, almost every field can say “AI will transform the field” though I’m not sure how much this is worth debating.
I liked and agreed with @Scott Alexander ’s recent tweet on the benefits of prediction markets, though I would have a hard time saying how much of a monetary investment into them that justifies:
@JenniferRM also replied with:
The Biden/Kamala example makes me think the value of prediction markets is not the uncovering of little-known info, but making things so blatantly obvious they can’t be denied by those who are deceiving themselves.
Yes, but there are also some examples of uncovering little-known info, e.g. insider trading giving advance warning of stuff.
Consider: Suppose that someone tries to assassinate POTUS by running up with a suicide vest on. Suppose that it appears they were a random ideologue acting alone, but then analysis of the markets reveals a suspicious spike in bets on POTUS death starting a few hours before the attack. This suggests that this person didn’t act alone; it suggests that at least one other person knew about it beforehand. Indeed it suggests that many people knew, because if it was just a few people then the odds that someone would leak are low. This is some interesting info that wouldn’t otherwise have been uncovered! (And if the Secret Service is monitoring the markets, they might actually be able to protect POTUS from such attacks more effectively!)
I’m not disagreeing, but I do have a question -
How would we distinguish new knowledge form random speculative spikes? E.g. some predictors might think that other predictors have a greater degree of knowledge than they actually do, and that creates a bubble.
We probably don’t have a foolproof way to distinguish but there would be various other bits of evidence we could look for, e.g. how capable are the AIs actually getting, what’s the gossip in SF, how long and sustained has the spike been, etc.
Since I was cited, and since I updated some from followup conversation, I want to close the loop here…
Buck read an entire book on this, and it sounds like the key movers and shakers were oblivious to the prediction markets but very interested in polls? But he didn’t mention the authors of the book specifically ruling out influence.
In general, I tend to think that staffers have a lot of collective cultural sway, and they care about “who can pull donor money to pay for staffers” a lot and George Clooney was both famous and a big donor and the author of this, and I would expect their shared Inner Ring’s Overton Window to track things (1) like Clooney and also (2) prediction markets. Also that channel of information would plausibly not end up in a book (which is likely to focus on the narrative swoop of second order famous people, rather than the third and fourth tier staffers and their watercooler gossip and whether that gossip mentioned prediction markets or polls)?
Anyway. Hearing from @Buck about the book lowered my credence on “Prediction Markets changed history there” from maybe 70% to maybe 42%?
I was surprised, but it was surprise from hearing confident well-informed posteriors, so all I can do is reason via aumancy, from thin summaries of thick data.
This explanation is plausible to me, and has the added benefit of explaining why @mabramov thinks prediction markets are less valuable than they are (if indeed they are less valuable than they are) - many people are prone to believing things that are obviously wrong, the main skill of good forecasters (beyond being generally well-informed) is that they are immune to this particular insanity, and so people who are not good forecasters benefit from access to insanity-immune opinions. This comment is the closest among existing comments to convincing me that prediction markets can have broad social utility.
Still, I don’t think this explanation really says that 40% chances are 40% chances, it says you can safely dismiss claims that 40% markets represent probabilities below like 10% or higher than like 80%. It’s still possible that these markets are not particularly-good information aggregators and that superforecasters are not particularly good at producing actionable insights across domains—calibration is not the optimization target. Thus I still update, based on the original post, towards prediction markets being worse, and perhaps significantly worse, than advertised, in their current incarnations.
The most recent Tetlockian forecasting style thing I’ve spent substantial time on is the 2025 and 2026 AI forecasting surveys, in which hundreds of people each year have made predictions a year out on benchmarks, and other indicators such as revenue.
The theory of change is to (a) establish common knowledge about how fast things are going relative to people’s expectations (and we collect data on people’s overall views on when AGI will be reached so we can sort of see if we’re “on track” for that), and (b) identify which people seem to be making the most accurate predictions. Importantly, it is not to elicit predictions that are directly useful for important decisions.
I’ve observed some evidence of this working, e.g. re: (a) establishing common knowedge, Anson of Epoch wrote an analysis that I’ve seen referenced a few times. I’m glad to have a data point against the common refrain of “people underpredict benchmark scores and overpredict real-world impact” from revenue outpacing people’s predictions (though it is a narrow and single data point).
Re: (b) identifying who is making the most accurate predictions, I found it informative that in Anson’s analysis (footnote 1), forecasters with pre and post-2030 timelines performed similarly. I’ve seen some people cite Ryan G and Ajeya’s #2 and #3 performance as evidence that we should listen to them, which is maybe good but I think people might be over-updating on the results with so few questions (I certainly pay attention to Ryan and Ajeya’s forecasts, but almost entirely for other reasons).
Overall, it’s unclear to me how this impactful this has been. I decided to run the 2026 survey because it seems at least a bit impactful and it doesn’t take that much time (I logged 18 hours on setting up the 2026 version, I’d guess that some others who helped spent a total of 20-60 hours). But the decision was borderline.
3 unrelated points:
I’d love to see this argument expanded further but also appreciate what you’ve written here.
You sort of mention this, but it strikes me that the argument doesn’t need to be “are prediction markets useful for doing good” but just needs to be “does the improvements to prediction markets and infrastructure made by EA money and resources actually meaningfully increase the amount of good prediction markets do?”
Lastly, may I suggest cross-posting this to the EA forum?
Anything in particular you want expanded upon? I think this is most of what I have to say on the matter. I’ve been saying some form of this opinion for about 3 years now and I’m happy this is finally out there.
Yea, my point is that the bar for EA money needs to be very very high.
It’s on the EA forum. Was posted at the same time!
I find it hard to tell how much impact widespread forecasting has. You go into the lack of tangible impact in the post but it’s hard to prove a negative (forecasting has had no impact). It’s also hard to prove real but intangible impact. I trust your opinion as an expert forecaster more than mine but I’m confused here.
One personal forecasting impact I’ve observed is the ability to point to existing prediction market. For example, someone writes a post about China definitely invading Taiwan in 2026. It’s hard for me to tell how good their argument is. With prediction markets I can find that market and get a more objective take. I can ask the post author how much money they put into the market, seeing they stand to make a lot of money correcting it.
This isn’t knock down argument against your post. I’m just giving a specific example of less tangible impact. Multiplied over many people making slightly better decisions that might be high impact. It can both be true that this impact is real and EA effort should go elsewhere.
I’ve long had some sense like this, though not the expertise to make a claim like this.
My impression is that a lot of the conceit of forecasting and prediction markets boils down to
People can learn to think better via the real-world feedback of predictions;
You can find people or groups that are good at forecasting, e.g. through prediction markets;
Then you can somewhat rely on their predictions
And you can learn how they made their predictions and thereby get better world models in general
Have things like this happened? E.g.
Are there important strategic X-derisking decisions that have been made based on finding someone who’s good at forecasting in general and then asking them to make predictions? (And do those seem like good decisions / for the right reasons?)
Have some people discovered through markets gone on to have a bunch of useful thoughts due to their forecasting skills / world models? Were they signal boosted in advance because of their known prediction market success?
I think a primary question I want an answer to here was what went so wrong with OpenPhil’s attempt to fund superforecasters on AI questions—why they were eg so much wronger than either of myself or Paul about the probability of a 2025 IMO gold medal win, as wrong (wronger?) than Holden Karnofsky on AGI timelines, etc. Do we know what went wrong? Is it fixable? Has it been fixed? If people with biases can get “superforecasts” that match their biases, and attempts to read the market entrails divine that markets in 2023 don’t think AGI is on the way, and we can’t get extinction-related prediction markets for settlement reasons, then there may not be much for AI people to do with prediction markets.
The rest of humanity should keep trying to get good at prediction markets in order to someday get a little closer to dath ilan, and I think non-real-money markets like Manifold are important for experimenting with that. (Manifold’s brief ill-fated attempt to become a real-money market was unfortunate.)
Personally I had updated they were substantially doing reference class forecasting, and that most people (forecasters included) have historically sucked at picking a reference class for AGI.
I currently have no better historical account, but it’s a sharp lesson about how all the bowing and scraping about deferring to superforecasters was just Modesty and “Outside View!”ing in disguise all over again, in the sense that the “superforecasters” who got hired somehow managed to end up being those without any inside view of AI.
fwiw, I think on AI related questions, Metaculus has chosen Pros who are much more bullish than what I’ve seen superforecasters do. I don’t want to say too much about private projects but some Pros were directly involved in the AI ecosystem. It’s not a head-to-head but there is also a significant vibe difference between FRI’s report on work and Metaculus Labor Hub where Pros have taken a much more dramatic position.
When I’m wrong on AI forecasts it’s more often due to things going a bit slower than I expected rather than too fast (although that has happened too) and I know I’m not the only one, there’s at least Haiku among the regular Pros who’s very “bullish”. (as in, very worried)
The most well-funded prediction markets in human history did not seem to think either Nvidia or OpenAI was worth particularly much before 2022. Might these just be rare things to anticipate if you are not literally at the eye of the storm?
I wrote a whole response to this :-)
Here’s an excerpt:
As I understand Marcus’s argument, his central thesis is that we haven’t seen the benefits of this past forecasting funding, but I think the opposite is true! Here are just a few examples:
It’s hard to measure the value of “epistemic infrastructure,” not just for forecasting sites but also things like Wikipedia and OurWorldInData. That doesn’t mean that value isn’t there. Has Wikipedia been a good return on investment? Obviously! Manifold is far less impactful than Wikipedia, but Wikipedia gets about $200 million per year between returns on its endowment and donations. The return on investment in Manifold is probably still way higher than Marcus seems to believe. Hundreds of thousands of people have made incrementally better decisions; hundreds of thousands of people have learned to think about the world a little more concretely and quantitatively. I’m one active user of thousands on Manifold and I’d personally value its impact on my life quite highly, as I’d wager Marcus might too.
Giant companies like Kalshi and Polymarket have grown in part because of research around how to best leverage crowdsourced forecasting. Inasmuch as they themselves have been funded, which Marcus claimed but I’m not sure is true, that probably has strictly provided an incredible ROI as these companies are now valuated in the billions. OTOH, there’s a pretty clear through-line between early forecasting research and the rise in popularity of these sites. You can have your own opinion on whether these companies are net-good for the world (the jury’s definitely still out), but this is a very significant impact you have to reckon with.
A lot of people get into the world of rationality and EA through forecasting. This was my entrance into the community. I found competitive forecasting fun, and only later did this give me the exposure to many of the other the things this community cares about—which I do now as well! Again, hard to quantify the impact of growing the EA/rationality community by ~5%. I’d guess that a few dozen people have taken the Giving Pledge that counterfactually wouldn’t have (I know of at least one). Just this alone is an ROI of many millions of dollars.
AI… not gonna get into this too much, but it’s pretty clear that different politicians, policymakers, and influential figures find different arguments appealing. High-level forecasting work is one of several ways of convincing people that AI is something they should take seriously or worry about. Again, quite hard to quantify how much less influence the various sectors of the AI safety lobby would wield right now without the backing of evidence-from-forecasting. Would the AI safety community be worse off without the support of research titans like Hinton and Bengio? Probably. Would they be worse off without a recent popular NYT bestseller? Probably. Would they be worse off without dozens of expert surveys forecasting high chances of negative outcomes? Also, yes, probably they would be.
Forecasting has been a really good way of getting people who are good at thinking clearly about the future noticed and into good roles! Recruiting is useful.
Better forecasting infrastructure will help the Dems allocate resources in the 2028 election. In fact, it likely helped the Dems keep the House close in 2024, which has provided an important check on Republican power over the last year or two. Betting markets have outperformed polling aggregators like 538 or the NYT since they took off in popularity and will continue to do so. This will help Democrats allocate funding to tipping point congressional races and is probably worth millions of dollars alone, if not far more (see recent EA focus on democracy).
Forecasting platforms provide a check on bullshit. It’s hard to continually lie when crowdsourced forecasts or prediction markets show a very different story. I think the epistemic environment these days would be even worse than it already is without this. This is similar to the value proposition of Pangram and other AI-detection software in pointing out AI slop. Hard to quantify but certainly valuable.
I’m not sure I understand the point. A lot of people have donated money to Wikipedia and now they have a big war chest. I agree Wikipedia has been valuable but I’m not sure how you are computing the value. I don’t see any proof that hundreds of thousands of people are making incrementally better decisions though. My point is that this was the hope but it is sort of just asserted without evidence. I’m happy you’ve enjoyed Manifold, I don’t think that means EA money should go to it. There is a very very high bar to clear for EA money.
I’m not sure I understand your point. I agree that Kalshi and Polymarket are big and have attracted users and investment. My point is that they haven’t given this outpouring of social returns that people claim they are to have. They are just good businesses (charging fees on gambling).
I don’t think you can claim millions of dollars from community growth but I’m sure this ROI would then be negative.
I agree that forecasting has been somewhat useful for identifying talent/nerd-sniping. I don’t think very much of this has happened though. I think less than 25 people (to be conservative) have roles due to their forecasting prowess where they were unknown before.
This has been discussed elsewhere.
There is no proof here. Would you say today’s epistemic environment is much better than 20 years ago?
This is like if lebron james wrote a post about how basketball is stupid and should receive less attention
Thanks for writing this! Some reasons I would steelman continued funding towards tetlockian or PM-style forecasting:
Source and screen for talent. There sure is some correlation between forecasting well and doing important things in EA. Just picking some people I know: Joel Becker, another former #1 Manifold trader, went on to join METR and then do their famous uplift studies. Eli Lifland went on to help make AI 2027. Peter Wildeford started Rethink Priorities, now IAPS. And some of your own track record in making good early-stage EA grants is here.
Beyond that, a bunch of smart and interesting people have expressed a lot of interest in forecasting, from banner bearers like Scott Alexander and Vitalik Buterin and Robin Hanson, to surprising cases like Anthony Giovanetti (of Slay the Spire) to <anon famous AI researcher who DM’d me> to Sam Altman. I do think there’s some amount of intellectual fashion-ism going on here, but also, you should fish where the fish are.
When funding is abundant, one bottleneck becomes finding (and building consensus around) talent; if the only thing that a bunch of money spent on forecasting does is to identify good people, that may be worth it.
Fast, accurate info in times of chaos. Prediction markets are actually quite good at distilling through noise during times of high uncertainty, eg recently around russia/ukraine war and the iran war. Manifold’s usage numbers spike every time there’s a crisis. Because PMs pay a high premium towards being speedy, they’re often the fastest trustworthy source of data. If the world becomes more chaotic due to faster tech growth, it may be quite valuable to have this place to stay up to date.
New unlocks from growth in AI capabilities. Historically, one very expensive input into forecasting is forecaster time. As LLMs catch up to top human forecasters, it’ll soon be cheap to get a calibrated answer to any question one might ask. On priors, this should help us with making better decisions or making futarchy possible. I agree this is somewhat speculative still, and wish more people were trying things in this space.
Do you have any reason to think those people succeeded in other areas as a result of the screen from forecasting success ? Did someone give them opportunities they wouldn’t have had without being a top forecaster?
My guess is this is just positive selection.
Sorry for the late reply but I have an anecdotal example: myself. I’m a lawyer by trade and there’s no way I would have been able to get involved with Epoch AI had it not been for me being one of the best Metaculus forecasters. The most notable thing I did was writing How Fast Could Robotics Production Scale Up? with JS Denain.
I know some of the top forecasters have worked for hedge funds although their backgrounds may have helped there? Not very knowledgeable about this and it’s possible that you’re right overall and it’s anecdotal but there’s at least one example.
No offense, but your example is using a track record in forecasting to get a job where you post forecasts. I asked about success in other areas.
Did you look at the report? I know the title is very forecasting related but most of it is not forecasting unless you consider all of the research and data collection as purely forecasting. I don’t think that’s the same thing even if there’s overlap in the skills, I wouldn’t consider what OWID does with their data collection (not the opinion articles) as forecasting. Here we did mix both but the report has value in itself without the forecasting part, imo.
Potential other example: the Bridgewater Contest on Metaculus. They’re supposedly reaching out to top contestants. A lot of finance students are invited but apparently some from schools they wouldn’t usually try to get interns from. I know a few of the top students have gotten internships but I have no idea if forecasting ends up being a matter in the recruitment process. They’re going through the trouble of organizing it (and paying for) for the third year now, so I guess that must at least matter a little?
To be clear, I’m not weighing in on whether this was a good argument or not but bringing examples that might fit the bill seems relevant.
Does forecasting asymmetrically favor people trying to do good things? my impression is that it’s a general fuel and what it ends up being useful for is highly contingent on who knows what.
I agree that forecasting is an ok way to find talent but not much of this has been done.
I agree that in chaos, they are useful but I don’t think they hit the bar for EA funding.
Sure, it’s speculative. If the AIs will use them, then they will also make them and we can just relax for now on the forecasting and let them do it in the future.
I liked this post. Strong upvote. I’m neutral on the funding/not funding issue but the points you make about not starting with the tool to solve some problem but with the problem and then selecting the tool(s) is so very important. I frequently see that as a problem in so much public debate and policy making.
(Copied my comment from the EA Forum and related to my post)
I don’t disagree with some of the fundamentals of this post. Before diving into that, I want to correct a factual error:
“the Swift Centre have received millions of dollars for doing research and studies on forecasting and teaching others about forecasting”
The Swift Centre for Applied Forecasting has not received millions in funding. The majority of our earnings have been through direct projects with organisations who want to use forecasting to inform their decisions.
On your wider argument. I think forecasting has probably received too much funding and the vast majority of that has misallocated on platforms and research. I believe some funding (hundreds of thousands) to maintain core platforms like Metaculus as a public good of information. Though, services like Polymarket can probably fill most of this need in the future (but many useful, informative markets would never reach the necessary volume to be reliable).
Where I think we disagree most is in the application of forecasting and some of the achievements. We’ve worked with frontier AI labs to inform their decisions, are currently advising a U.K. Minister’s team on a central piece of their policy, and are about to start a secondment where I will be advising one of the most influential decision making committees in the country to help improve their scenario analysis and forecasting. Forecasting, and specifically, the science of decision making that it is built on, has the ability to structurally improve decisions in institutions. Significantly better than asking two or three of your smartest friends. That was just never funded, so instead we conclude forecasting is not useful.
I think part of the lack of use is institutional inertia. To use a concrete example, politicians (generally) believe that the things they are doing will be successful, popular, and beneficial to their reelection odds. In the past, the only immediate counter-evidence was polling, and this could be dismissed, often rightly, as push-polling intended to apply pressure rather than accurately predict results. Some of that skepticism has carried over to mechanisms that don’t have that problem.
The age of our current political class, rather set in its ways, mean that this will take a while to take effect, but, among younger politics-interested people, being able to say unambiguously that e.g. the Iran War is bad politics because the administration’s election odds fell off a cliff right after joining it is valuable. There’s no argument to be made that any electoral damage was because of tax policy, or “culture war stuff”, or “decorum”. The blame for a defeat can be laid precisely on the offending policy. Every 20-something intern in Washington, including the ones that will staff and advise future administrations, has seen this graph, in which the 2028 election odds remained steady through every news cycle right up until the end of February, in contrast with legacy media headlines and polls. They know exactly which policy decision the public truly hated, and their job security will depend on their employers not replicating it.
I think politicians to plenty of things that they believe to be mostly ignored by most voters and not directly affect reelection odds. Dominics Cummings had a lot of trouble getting politicians in the UK to do the kind of things that would likely be good for reelection odds instead of campaigning on the pet issues of those politicians.
When it comes to efforts of politicians that are targeted at convincing voters, focus groups are a key tool that’s used in addition to polling.
Focus groups have their own issues. Easy to subvert through any number of underhanded means, and also accidentally through errors in sampling. They also don’t meaningfully indicate issue pertinence. Suppose I really like Trump’s policy on road sign renaming, and I really hate his policy on highway median resizing. I might talk passionately about either in a focus group where I’m being paid for my opinion on those issues, but neither is likely to influence my voting behavior.
Prediction markets are valuable in that they can cleanly separate issues that are popular/unpopular but electorally insignificant from ones that make or break a campaign.
I think you aren’t factoring in that focus groups are lead by people that are not stupid. You can ask questions in a way to get answers about what people care about.
Regime Change #2: A plea to Silicon Valley—start a project NOW to write the plan for the next GOP candidate by Dominics Cummings is a good read for what focus groups can do.
At best, a focus group is directed by well-intentioned people who may not know every pertinent correlation value needed to ensure a representative sample of the public. The conventional wisdom is that focus groups are typically used and designed by people who are somewhat out of touch, and this isn’t entirely false.
At worst, a focus group is directed by people with an interest in showing a certain outcome to a certain set of people. Using the earlier example, AIPAC lobbyists might want to downplay the political damage caused by the Iran War, and can alter candidate selection and discussion framing in order to do so when given authority over study design.
Moreover, participants often want to please the experimenters, and will, consciously or otherwise, speak differently depending on what they think they want.
Put simply, you end up with all the same issues as polling. Potential for bias, design error, and unreliable subjects.
It is true that this (almost) never works, and has resulted in many, many wasted investment dollars.
Unfortunately it is also true that when a problem comes along, it is very convenient if someone else has already partially developed the underlying technology that can be repurposed to solve it. Cuts years off the timeline to a solution. Use of blockchain for critical mineral passports, for example, is starting to happen because the tool was already there.
It is also also true that complex technologies do often see decades or generations of failed implementations before they clear the last hurdle and make a big impact. Light bulbs and carbon fiber and steam engines are examples of that.
I’m not saying this is a good reason to fund any given project/company/technology that you think is failing. I’m definitely not saying that current implementations of prediction markets are fulfilling the potential anyone hoped for, or even that they’re net-positive in their impact today. And efficiently killing off failures and studying their remains is a core part of the innovation process. But I do think ways to aggregate hard-to-share information from many minds are the kind of thing our future selves might be glad we experimented with early and repeatedly.
The list of problems with mineral traceability has never really included someone tampering with the data already in an external database. What it does include is the data entered into the database being false to begin with, while many participants of the projects have economic incentives to cover it up, and there are indeed geochemical fingerprinting attempts to fix that problem but they are entirely orthogonal to the issue of the data storage and access.
From what I know, I see no significant advantages of a blockchain against a public, well-audited relational database maintained by an independent NGO (like a special body under a UN mandate) besides maybe a geopolitical one (the NGO has to be based somewhere after all), but quite a few disadvantages like not being able to correct the fraud which has been discovered (very convenient for the fraudsters indeed, also no one to blame for that!), trickiness from an engineering/interoperability point of view etc.
This is a good point and that probably wasn’t the best or cleanest example I could have picked.
Last year I spoke with several blockchain company founders that pivoted to that application as part of a project at work. For the most part, they agreed that data entry is the biggest security hole in any such system. And the sense I got was that so many companies were dragging their heels on complying with passport mandates that have upcoming deadlines it would basically have to be a software product because there would not be time to implement the other options.
I think this is a good point. I don’t think it makes up for the ~$125M or so that EA has put into forecasting.
That’s fair. Totally reasonable to say EA should stop.
I express no position on the object level, but you may be arguing for something that has already largely happened, see here:
They will still be funding lots of forecasting, just not through a dedicated fund.
Related comment I made 2 years ago and ensuing discussion: https://forum.effectivealtruism.org/posts/ziSEnEg4j8nFvhcni/new-open-philanthropy-grantmaking-program-forecasting?commentId=7cDWRrv57kivL5sCQ
Once, Robin Hanson made a post on his blog about what he’d do with $1M: try to get prediction markets used in companies. The theory of impact is that they are more likely to actually use the markets, and that it is more impactful than political discourse or prop bets. He suggested making a market of the form “If X CEO steps down and is replaced by Y, the stock price will go up by Z”, then trying to get a company to take its advice and end up proving it right, and then to try to make it something shareholders demand of any company.
Current markets have landed in the niche of generalized-sports-gambling. But this doesn’t mean other niches are impossible. I agree that it means funding should be directed to breaking into a new niche.
Prediction markets, or at least judgemental forecasting, has been used in companies before though? At least Google had a pretty substantive internal forecasting platform at some point.
And they tended to be quite successful, yet not get used or get shutdown, which Hanson attributes to cognitive biases and upsetting social hierarchies. But it’s possible for such things to get adopted—e.g. hospitals do hand washing now, and even checklists have spread. Thus the idea is to get some more high profile showcase of its usefulness, and for decisions that are worth much more.
How do you rate the educational benefit to participating in prediction markets for about a year? You mention that trivial/gambling markets on short-term BTC movements don’t sharpen skills, what about non-trivial markets? How does it compare to other educational/community activities like commenting on LessWrong or attending meetups?
I think from the time I started to the time I stopped, I didnt get any better. I was just as reasonable at both points in time
I think you’re measuring the right thing (decisions changed) but blaming the wrong cause. I think the field underperformed because:
The questions are at the wrong altitude. “P(AGI by 2027)” is fun to trade but hard to act on. The decision-relevant questions (e.g., will this research direction work, will this eval saturate first, will this intervention move its metric) rarely get asked because they’re too narrow and poorly funded to attract pro-forecasters. Moreover, such narrow questions usually rely on internal information that is difficult to attain for the forecasters.
Good forecasts aren’t reaching decision-makers. There’s no apparent pipeline from these forecasting platforms to decisions at a large enough scale to appear noticeable. I’d argue forecasting is still “niche” amongst the general population.
AI forecasters fix (1) directly. Calibrated answers to arbitrary narrow questions means you can finally ask the questions that bind to actual decisions, with the internal information and predictive power actually needed to forecast on these questions correctly.
If you had an expert AI forecasting on your daily decisions would you not listen?
I don’t think the right update from the last decade is “stop funding”. I think it’s “stop funding platforms and tournaments, start funding question design, decision integration, and automated forecasting.”
“People have a tool they want to use, whether that be cryptocurrency or forecasting, and then try to solve problems with it because they really believe in the solution”
there’s one cryptocurrency that avoids this trap, Monero.