The track record of survey-based macroeconomic forecasting

I’m interested in forecasting, and one of the areas where plenty of forecasting has been done is macroeconomic indicators. This post looks at what’s known about macroeconomic forecasting.

Macroeconomic indicators such as total GDP, GDP per capita, inflation, unemployment, etc. are reported through direct measurement every so often (on a yearly, quarterly, or monthly basis). A number of organizations publish forecasts of these values, and the forecasts can eventually be compared against the actual values. Some of these forecasts are consensus forecasts: they involve polling a number of experts on the subject and aggregating the responses (for instance, by taking an arithmetic mean or geometric mean or appropriate weighted variant of either). We can therefore try to measure the usefulness of the forecasts and the rationality of the forecasters.

Why might we want to measure this usefulness and rationality? There could be two main motivations:

A better understanding of macroeconomic indicators and whether and how we can forecast them well.
A better understanding of forecasting as a domain as well as the rationality of forecasters and the inherent difficulties in forecasting.

My interest in the subject stems largely via (2) rather than (1): I’m trying to understand just how valuable forecasting is. However, the research I cite has motivations that involve some mix of (1) and (2).

Within (2), our interest might be in studying:

The usefulness and rationality of individual forecasts (that are part of the consensus) in absolute terms.
The usefulness and rationality of the consensus forecast.
The usefulness and rationality of individual forecasts relative to the consensus forecasts (treating the consensus forecast as a benchmark for how easy the forecasting task is).

The macroeconomic forecasting discussed here generally falls in the near but not very near future category in the framework I outlined in a recent post.

Here is a list of regularly published macroeconomic consensus forecasts. The table is taken from Wikipedia (I added the table to Wikipedia).

Organization name	Forecast name	Number of individuals surveyed	Number of countries covered	List of countries/regions covered	Frequency	How far ahead the forecasts are made for	Start date
Consensus Economics^[2]^[3]	Consensus Forecasts^TM	More than 700^[2]^[3]	85^[2]^[3]	Member countries of the G-7 industralized nations, Asia Pacific, Eastern Europe, and Latin America.^[2]^[3]	Monthly^[2]^[3]	24 months	October 1989^[4]
FocusEconomics^[5]	FocusEconomics Consensus Forecast^[6]	Several hundred^[6]	More than 70^[6]	Asia, Eastern Europe, Euro Area, Latin America, Nordic economies^[6]	Monthly^[6]	?	1998^[7]
Blue Chip Publications division of Aspen Publishers^[8]	Blue Chip Economic Indicators^[8]	50+^[8]	1	United States	Monthly^[8]	?	1976^[8]
Federal Reserve Bank of Philadelphia	Survey of Professional Forecasters^[9]^[10]	a few hundred	1	United States	Quarterly^[9]	6 quarters, plus a few more long-range forecasts	1968^[9]^[10]
European Central Bank	ECB Survey of Professional Forecasters^[11]^[12]	?	?	Europe	Quarterly^[11]	Two quarters and six quarters from now, plus the current and next two years	1999^[11]^[12]
Federal Reserve Bank of Philadelphia	Livingston Survey^[13]	?	1	United States^[13]	Bi-annually (June and December every year)^[13]	Two bi-annual periods (6 months and 12 months from now), plus some forecasts for two years	1946^[13]

Strengths and weaknesses of the different surveys

Time series available: The surveys that have been around longer, such as the Livingston Survey (started 1946), Survey of Professional Forecasters (started 1968) and the Blue Chip Economic Indicators (started 1976) have accumulated a larger time series of data. This allows for more interesting analysis.
Number of regions for which macroeconomic indicators are forecast: The surveys that cover a larger number of countries, such as the Consensus Forecasts^TM (85 countries) and the FocusEconomics Consensus Forecast (over 70 countries) can be used to study hypotheses about differences in the accuracy and bias in forecasts based on country.
Time that people are asked to forecast ahead, frequency of forecast, and number of different forecasts (at different points in time) for the same indicator: Surveys differ in how far ahead people have to forecast, how frequently the forecasts are published, and the number of different times a particular quantity is forecast. For instance, the Consensus Forecasts^TM includes forecasts for the next 24 months, and is published monthly. So we have 24 different forecasts of any given quantity, with the forecasts made at time points separated by a month each. This is at the upper end. The Survey of Professional Forecasters publishes at a quarterly frequency and includes macroeconomic indicator forecasts for the next 6 quarters. This is a similar time interval to the Consensus Forecasts^TM but a smaller number of forecasts for the same quantity because of a lower frequency of publication.
Evaluation of individual versus consensus forecasts: For some forecasts (such as those published by the Survey of Professional Forecasters), the published information includes individual forecasts, so we can measure the usefulness and rationality of individual forecasts rather than that of the consensus forecast. For others, such as Consensus Forecasts^TM, only the consensus is available, so only more limited tests are possible. Note that the question of the value of individual forecasts and the question of the value of the consensus forecast are both important questions.

The history of research based on consensus forecast sources

There has been a gradual shift in what consensus forecasts are used in research studying forecasts:

Early research on macroeconomic forecasting, in the 1970s, began with a few people collecting their own data by polling experts.
In the 1980s, the Livingston bi-annual survey was used as a major data source by researchers.
In the late 1980s and through the 1990s, researchers switched to the Survey of Professional Forecasters and the Blue Chip Economic Indicators Survey, with the focus shifting to the latter more over time. Note that the Blue Chip Economic Indicators had been started only in 1976, so it’s natural that it took some time for people to have enough data from it to publish research.
In the 2000s, research based on Consensus Forecasts^TM was added to the mix. Note that Consensus Economics started out in 1989, so it’s understandable that research based on it took a while to start getting published.

There has also been a gradual shift in views about forecast accuracy:

Early literature in the 1970s and early 1980s found evidence of inaccuracy and bias in forecasts.
In the 1990s, as the literature started looking at forecasts that polled more people and had higher frequency, the view shifted in the direction of consensus forecasts having very little inaccuracy and bias, whereas the topic of bias in individual forecasts is more hotly contested.

Tabulated bibliography (not comprehensive, but intended to cover a reasonably representative sample)

Paper	Forecast used	Conclusion about efficiency and bias of individual and consensus forecast
McNees (1978)	Own data (3 people, 4 quarterly forecasts)	Some forecasts are biased, and forecasters are not rational
Figlewski and Wachtel (1981)	Livingston Survey	Inflationary expectations are more consistent with the adaptive expectations hypothesis than the rational expectations hypothesis. The paper was critiqued by Dietrich and Joines (1983), and the authors responded in Figlewski and Wachtel (1983).
Keane and Runkle (1990)	Survey of Professional Forecasters (called the ASA-NBER survey at the time)	Individual forecasters appear rational, although rationality is not established conclusively. Methodological problems are noted with past literature arguing for irrationality and bias in individual forecasts.
Swidler and Ketchler (February 1990)	Blue Chip Economic Indicators	Consensus forecasts are unbiased and efficient. Does not appear to look at individual forecasts.
Batchelor and Dua (November 1991)	Blue Chip Economic Indicators	Consensus forecasts are unbiased, but some individual forecasts are biased.
Ehrbeck and Waldmann (1996)	North-Holland Economic Forecasts	The abstract: “Professional forecasters may not simply aim to minimize expected squared forecast errors. In models with repeated forecasts the pattern of forecasts reveals valuable information about the forecasters even before the outcome is realized. Rational forecasters will compromise between minimizing errors and mimicking prediction patterns typical of able forecasters. Simple models based on this argument imply that forecasts are biased in the direction of forecasts typical of able forecasters. Our models of strategic bias are rejected empirically as forecasts are biased in directions typical of forecasters with large mean squared forecast errors. This observation is consistent with behavioral explanations of forecast bias.”
Stark (1997)	Survey of Professional Forecasters	Attempts to replicate, for the Survey of Professional Forecasters, the results of Lamont (1995) for the Business Week survey that forecasters get more radical as they gain experience. Finds that the results do not replicate, and posits an explanation for this.
Laster, Bennett, and Geoum (1999)	Blue Chip Economic Indicators	Individual forecasters are biased. The paper describes a theory for how such bias might be rational given the incentives facing forecasters. The empirical data is a sanity check rather than the focus of the paper.
Batchelor (2001) (ungated early draft here)	Consensus Forecasts^TM	Does not discuss bias in Consensus Forecasts^TM per se, but notes that it is better than the IMF and OECD forecasts and that incorporating information from those forecasts does not improve upon Consensus Forecasts^TM.
Ottaviani and Sorensen (2006)	(none, discusses general theoretical model)	Abstract: “We develop and compare two theories of professional forecasters’ strategic behavior. The first theory, reputational cheap talk, posits that forecasters endeavor to convince the market that they are well informed. The market evaluates their forecasting talent on the basis of the forecasts and the realized state. If the market expects forecasters to report their posterior expectations honestly, then forecasts are shaded toward the prior mean. With correct market expectations, equilibrium forecasts are imprecise but not shaded. The second theory posits that forecasters compete in a forecasting contest with pre-specified rules. In a winner-take-all contest, equilibrium forecasts are excessively differentiated.”
Batchelor (2007)	Consensus Forecasts^TM	Consensus forecasts are unbiased, some individual forecasts are biased. But the persistent optimism and pessimism of some forecasters seems inconsistent with existing models of rational bias.
Ager, Kappler, and Osterloh (2009) (ungated version)	Consensus Forecasts^TM	There are consistently biased forecasts for some countries, but not for all. A lack of information efficiency is more severe for GDP forecasts than for inflation forecasts.

The following overall conclusions seem to emerge from the literature:

For mature and well-understood economics such as that of the United States, consensus forecasts are not notably biased or inefficient. In cases where they miss the mark, this can usually be attributed to issues of insufficient information or shocks to the economy.
There may however be some countries. particularly those whose economies are not sufficiently well-understood, where the consensus forecasts are more biased.
The evidence on whether individual forecasts are biased or inefficient is more murky, but the research generally points in the direction of some individual forecasts being biased. Some people have posited a “rational bias” theory where forecasters have incentives to choose a value that is plausible but not the most likely in order to maximize their chances of getting a successful unexpected prediction. We can think of this as an example of product differentiation. Other sources and theories of rational bias have also been posited, but there is no consensus in the literature on whether and how these are sufficient to explain observed individual bias.

Some addenda

A Forbes article recommends using the standard sources for forecasts to business people who need economic forecasts for their business plan, rather than aiming for something more fancy.
There are some other forecasts I didn’t list here, such as the Greenbook forecasts, IMF’s World Economic Outlook, and OECD Economic Outlook. As far as I could make out, these are not generated through a consensus forecast procedure. They involve some combination of models and human judgment and discussion. The bibliography I tabulated above includes Batchelor (2001), that found that the Consensus Forecasts^TM outperformed the OECD and IMF forecasts. Some research on the Greenbook forecasts can be found in the footnotes on the Wikipedia page about Greenbook. I didn’t think these were sufficiently germane to be included in the main bibliography.