Suppose we have a bunch of (forecasted value, actual value) pairs for a given quantity (with different measured actual values at different times). An example would be GDP growth rate measures in different years. For each year, we have a forecasted value and an actual value. So we have a bunch of (forecasted value, actual value) pairs, one for each year. How do we judge the usefulness of the forecasts at predicting the value. Here, we discuss a few related measures: accuracy, bias, and dependency (specifically, correlation).
Accuracy
The accuracy of a forecast refers to how far, on average, the forecast is from the actual value. Two typical ways of measuring the accuracy are:
Compute the mean absolute error: Take the arithmetic mean (average) of the absolute values of the errors for each forecast.
Compute the root mean square error: Take the square root of the arithmetic mean of the squares of the errors.
The size of the error, measured in either of these ways, is a rough estimate of how accurate the forecasts are in general (the larger the error, the less accurate the forecast). Note that an error of zero represents a perfectly accurate forecast.
Note that this is a global measure of accuracy. But it may be the case that forecasts are more accurate when the actual values are at a particular level, and less accurate when they are at a different level. There are mathematical models to test for this.
Bias
When we ask whether the forecast is biased, we’re interested in knowing whether the size of the error in the positive direction systematically exceeds the size of the error in the negative direction. One method for estimating this is to compute the mean signed difference (i.e., take the arithmetic mean of errors for individual forecasts without taking the absolute value). If this comes out as zero, then the forecasting is unbiased. If it comes out as positive, the forecasts are biased in the positive direction, whereas if it comes out as negative, the forecasts are biased in the negative direction.
The above is a start, but it’s not good enough. In particular, the error could come out nonzero simply because of random fluctuations rather than bias. We’d need to complicate the model somewhat in order to make probabilistic or quantitative assessments to get a sense of whether or how the forecasts are really biased.
Again, the above is a global measure of bias. But it may be the case that there are different biases for different values. There are mathematical models to test for this.
Are accuracy and bias related? Yes, in the obvious sense that the degree of inaccuracy gives an upper bound on the degree of bias. In particular, for instance, the mean absolute error gives an upper bound on the mean signed difference. So a perfectly accurate forecast is also unbiased. However, we can have fairly inaccurate forecasts that are unbiased. For instance, a forecast that always guesses the mean of the distribution of actual values will be inaccurate but have zero bias.
The above discusses additive bias. There may also be multiplicative bias. For instance, the forecasted value may be reliably half the actual value. In this case, doubling the forecasted value allows us to obtain the actual value. There could also be forms of bias that are not captured in either way.
Dependency and correlation
Ideally, what we want to know is not so much whether the forecasts themselves are accurate or biased, but whether we can use them to generate new forecasts that are good. So what we want to know is: once we correct for bias (of all sorts, not just additive or multiplicative), how accurate is the new forecast? Another way of framing this is: what exactly is the nature of dependency between the variable representing the forecasted value and the variable representing the actual value?
Testing for the nature of the dependency between variables is a hard problem, particularly if we don’t have a prior hypothesis for the nature of the dependency. If we do have a hypothesis, and the relation is linear in unknown parameters, we can use the method of ordinary least squares regression (or another suitable regression) to find the best fit. And we can measure the goodness of that fit through various statistical indicators.
In the case of linear regression (i.e., trying to fit using a linear functional dependency between the variables), the square of the correlation between the variables is the R2 of the regression, and offers a decent measure of how close the variables are to being linearly related. A correlation of 1 implies a R2 of 1, and implies that the variables are perfectly correlated, or equivalently, that a linear function with positive slope is a perfect fit. A correlation of −1 also implies a R2 of 1, would mean that a linear function with negative slope is a perfect fit. A correlation of zero means that the variables are completely uncorrelated.
Note also that linear regression covers both additive and multiplicative bias (and combinations thereof) and is often good enough to capture the most basic dependencies.
If the value of R2 for the linear regression is zero, that means the variables are uncorrelated. Although independent implies uncorrelated, uncorrelated does not imply independent, because there may be other nonlinear dependencies that miraculously give zero correlation. In fact, uncorrelated does not imply independent even if the variables are both normally distributed. As a practical matter, a correlation of zero is often taken as strong evidence that neither variable tells us much about the other. This is because even if the relationship isn’t linear, the existence of some relationship makes a nonzero correlation more plausible than an exact zero correlation. For instance, if the variables are positively related (higher forecasted values predict higher actual values) we expect a positive correlation and a positive R2. If the variables are negatively related (higher forecasted values predict lower actual values) we expect a negative correlation, but still a positive R2.
For the trigonometrically inclined: The Pearson correlation coefficient, simply called the correlation here, measures the cosine of the angle between a vector based on the forecasted values and a vector based on the actual values. The vector based on the forecasted values is obtained by starting with the vector of the forecasted values and subtracting from each coordinate the mean forecasted value. Similarly, the vector based on the actual values is obtained by starting with the vector of the actual values and subtracting from each coordinate the mean actual value. The R2 value is the square of the correlation, and measures the proportion of variance in one variable that is explained by the other (this is sometimes referred to as the coefficient of determination). 1 -R2 represents the square of the sine between the vectors, and represents how alienated the vectors are from each other. A correlation of 1 means the vectors are collinear and point in the same direction, a positive correlation less than 1 means they form an acute angle, a zero correlation means they are at right angles, a negative correlation greater than −1 means they form an obtuse angle, and a correlation of −1 means the vectors are collinear and point in opposite directions.
Usefulness versus rationality
The simplest situation is where the forecasts are completely accurate. That’s perfect. We don’t need to worry about doing better.
In the case that the forecasts are not accurate, and if we have had the luxury of crunching the numbers and figuring out the nature of dependency between the forecasted and actual values, we’d want a situation where the actual value can be reliably predicted from the forecasted value, i.e., the actual value is a (known) function of the forecasted value. A simple case of this is where the actual value and forecasted value have a correlation of 1. This means that the actual value is a known linear function of the forecasted value. (UPDATE: This process of using a known linear function to correct for systematic additive and multiplicative bias is known as Theil’s correction). So the forecasted value itself is not good, but it allows us to come up with a good forecast.
What would it mean for a forecast to be unimprovable? Essentially, it means that the best value we can forecast based on the forecasted value is the forecasted value. Wait, what? What we mean is that the forecasters aren’t leaving any money on the table: if they could improve the forecast simply by correcting for a known bias, they have already done so. Note that a forecast being unimprovable does not say anything directly about the R2 value. Rather, the unimprovability suggests that the best functional fit between the forecasted and the actual value would be the identity function (actual value = forecasted value). For the linear regression case, it suggests that the slope for the linear regression is 1 and the intercept is 0. Or at any rate, that they are close enough. Note that a forecast that’s completely useless is unimprovable.
The following table captures the logic (note that the two rows just describe the extreme cases, rather than the logical space of all possibilities).
The forecast cannot be improved upon
The forecast can be improved upon
The forecast, once improved upon, is perfect
The forecasted value equals the actual value.
The forecasted value predicts the actual value perfectly, but is not itself perfect. For instance, they could have a correlation of 1, in which case the prediction would be via a linear function.
The forecast, even after improvement, is useless at the margin (i.e., it does not give us information we didn’t already have from knowledge of the existing distribution of actual vaues)
The forecast just involves perfectly guessing the mean of the distribution of actual values (assuming that the distribution is known in advance; if it’s not, then things become even more murky).
The actual value is independent of the forecast, and it does not involve simply guessing the mean.
Note that if forecasters are rational, then we should be in the column “The forecast cannot be improved upon” and therefore between the extreme case that the forecast is already perfect and that the forecast just involves guessing the mean of the distribution (assuming that the distribution is known in advance).
So there are two real and somewhat distinct questions about the value of forecasts:
(The question whose extreme answers give the rows): How useful are the forecasts, in the sense that, once we extract all the information upon them by correcting for bias and applying the appropriate functional form, how accurate are the new forecasts?
(The question whose answers give the columns): How rational are the forecasters, in the sense of how close are their forecasts to the most useful forecasts that can be extracted from those forecasts? (Note that even if the forecasts cannot be improved upon, that doesn’t mean the forecasts are rational in the broader sense of making the best guess in terms of all available information, but it is in any case consistent with rationality in this broader sense).
The usefulness of forecasts and the rationality of forecasters
Suppose we have a bunch of (forecasted value, actual value) pairs for a given quantity (with different measured actual values at different times). An example would be GDP growth rate measures in different years. For each year, we have a forecasted value and an actual value. So we have a bunch of (forecasted value, actual value) pairs, one for each year. How do we judge the usefulness of the forecasts at predicting the value. Here, we discuss a few related measures: accuracy, bias, and dependency (specifically, correlation).
Accuracy
The accuracy of a forecast refers to how far, on average, the forecast is from the actual value. Two typical ways of measuring the accuracy are:
Compute the mean absolute error: Take the arithmetic mean (average) of the absolute values of the errors for each forecast.
Compute the root mean square error: Take the square root of the arithmetic mean of the squares of the errors.
The size of the error, measured in either of these ways, is a rough estimate of how accurate the forecasts are in general (the larger the error, the less accurate the forecast). Note that an error of zero represents a perfectly accurate forecast.
Note that this is a global measure of accuracy. But it may be the case that forecasts are more accurate when the actual values are at a particular level, and less accurate when they are at a different level. There are mathematical models to test for this.
Bias
When we ask whether the forecast is biased, we’re interested in knowing whether the size of the error in the positive direction systematically exceeds the size of the error in the negative direction. One method for estimating this is to compute the mean signed difference (i.e., take the arithmetic mean of errors for individual forecasts without taking the absolute value). If this comes out as zero, then the forecasting is unbiased. If it comes out as positive, the forecasts are biased in the positive direction, whereas if it comes out as negative, the forecasts are biased in the negative direction.
The above is a start, but it’s not good enough. In particular, the error could come out nonzero simply because of random fluctuations rather than bias. We’d need to complicate the model somewhat in order to make probabilistic or quantitative assessments to get a sense of whether or how the forecasts are really biased.
Again, the above is a global measure of bias. But it may be the case that there are different biases for different values. There are mathematical models to test for this.
Are accuracy and bias related? Yes, in the obvious sense that the degree of inaccuracy gives an upper bound on the degree of bias. In particular, for instance, the mean absolute error gives an upper bound on the mean signed difference. So a perfectly accurate forecast is also unbiased. However, we can have fairly inaccurate forecasts that are unbiased. For instance, a forecast that always guesses the mean of the distribution of actual values will be inaccurate but have zero bias.
The above discusses additive bias. There may also be multiplicative bias. For instance, the forecasted value may be reliably half the actual value. In this case, doubling the forecasted value allows us to obtain the actual value. There could also be forms of bias that are not captured in either way.
Dependency and correlation
Ideally, what we want to know is not so much whether the forecasts themselves are accurate or biased, but whether we can use them to generate new forecasts that are good. So what we want to know is: once we correct for bias (of all sorts, not just additive or multiplicative), how accurate is the new forecast? Another way of framing this is: what exactly is the nature of dependency between the variable representing the forecasted value and the variable representing the actual value?
Testing for the nature of the dependency between variables is a hard problem, particularly if we don’t have a prior hypothesis for the nature of the dependency. If we do have a hypothesis, and the relation is linear in unknown parameters, we can use the method of ordinary least squares regression (or another suitable regression) to find the best fit. And we can measure the goodness of that fit through various statistical indicators.
In the case of linear regression (i.e., trying to fit using a linear functional dependency between the variables), the square of the correlation between the variables is the R2 of the regression, and offers a decent measure of how close the variables are to being linearly related. A correlation of 1 implies a R2 of 1, and implies that the variables are perfectly correlated, or equivalently, that a linear function with positive slope is a perfect fit. A correlation of −1 also implies a R2 of 1, would mean that a linear function with negative slope is a perfect fit. A correlation of zero means that the variables are completely uncorrelated.
Note also that linear regression covers both additive and multiplicative bias (and combinations thereof) and is often good enough to capture the most basic dependencies.
If the value of R2 for the linear regression is zero, that means the variables are uncorrelated. Although independent implies uncorrelated, uncorrelated does not imply independent, because there may be other nonlinear dependencies that miraculously give zero correlation. In fact, uncorrelated does not imply independent even if the variables are both normally distributed. As a practical matter, a correlation of zero is often taken as strong evidence that neither variable tells us much about the other. This is because even if the relationship isn’t linear, the existence of some relationship makes a nonzero correlation more plausible than an exact zero correlation. For instance, if the variables are positively related (higher forecasted values predict higher actual values) we expect a positive correlation and a positive R2. If the variables are negatively related (higher forecasted values predict lower actual values) we expect a negative correlation, but still a positive R2.
For the trigonometrically inclined: The Pearson correlation coefficient, simply called the correlation here, measures the cosine of the angle between a vector based on the forecasted values and a vector based on the actual values. The vector based on the forecasted values is obtained by starting with the vector of the forecasted values and subtracting from each coordinate the mean forecasted value. Similarly, the vector based on the actual values is obtained by starting with the vector of the actual values and subtracting from each coordinate the mean actual value. The R2 value is the square of the correlation, and measures the proportion of variance in one variable that is explained by the other (this is sometimes referred to as the coefficient of determination). 1 -R2 represents the square of the sine between the vectors, and represents how alienated the vectors are from each other. A correlation of 1 means the vectors are collinear and point in the same direction, a positive correlation less than 1 means they form an acute angle, a zero correlation means they are at right angles, a negative correlation greater than −1 means they form an obtuse angle, and a correlation of −1 means the vectors are collinear and point in opposite directions.
Usefulness versus rationality
The simplest situation is where the forecasts are completely accurate. That’s perfect. We don’t need to worry about doing better.
In the case that the forecasts are not accurate, and if we have had the luxury of crunching the numbers and figuring out the nature of dependency between the forecasted and actual values, we’d want a situation where the actual value can be reliably predicted from the forecasted value, i.e., the actual value is a (known) function of the forecasted value. A simple case of this is where the actual value and forecasted value have a correlation of 1. This means that the actual value is a known linear function of the forecasted value. (UPDATE: This process of using a known linear function to correct for systematic additive and multiplicative bias is known as Theil’s correction). So the forecasted value itself is not good, but it allows us to come up with a good forecast.
What would it mean for a forecast to be unimprovable? Essentially, it means that the best value we can forecast based on the forecasted value is the forecasted value. Wait, what? What we mean is that the forecasters aren’t leaving any money on the table: if they could improve the forecast simply by correcting for a known bias, they have already done so. Note that a forecast being unimprovable does not say anything directly about the R2 value. Rather, the unimprovability suggests that the best functional fit between the forecasted and the actual value would be the identity function (actual value = forecasted value). For the linear regression case, it suggests that the slope for the linear regression is 1 and the intercept is 0. Or at any rate, that they are close enough. Note that a forecast that’s completely useless is unimprovable.
The following table captures the logic (note that the two rows just describe the extreme cases, rather than the logical space of all possibilities).
Note that if forecasters are rational, then we should be in the column “The forecast cannot be improved upon” and therefore between the extreme case that the forecast is already perfect and that the forecast just involves guessing the mean of the distribution (assuming that the distribution is known in advance).
So there are two real and somewhat distinct questions about the value of forecasts:
(The question whose extreme answers give the rows): How useful are the forecasts, in the sense that, once we extract all the information upon them by correcting for bias and applying the appropriate functional form, how accurate are the new forecasts?
(The question whose answers give the columns): How rational are the forecasters, in the sense of how close are their forecasts to the most useful forecasts that can be extracted from those forecasts? (Note that even if the forecasts cannot be improved upon, that doesn’t mean the forecasts are rational in the broader sense of making the best guess in terms of all available information, but it is in any case consistent with rationality in this broader sense).
Background reading
For more background, see the Wikipedia pages on forecast bias and bias of an estimator and the content linked therein.