Board of Governors of the Federal Reserve System
International Finance Discussion Papers Number 779
September 2003
Bayesian Model Averaging and Exchange Rate Forecasts
Jonathan H. Wright
NOTE: International Finance Discussion Papers are preliminary materials circulated to stimulate discussion and critical comment. References in publications to International Finance Discussion Papers (other than an acknowledgment that the writer has had access to unpublished material) should be cleared with the author or authors. Recent IFDPs are available on the Web at www.federalreserve.gov/pubs/ifdp/.
Bayesian Model Averaging and Exchange Rate Forecasts
Jonathan H. Wright*
Abstract: Exchange rate forecasting is hard and the seminal result of Meese and Rogoff (1983) that the exchange rate is well approximated by a driftless random walk, at least for prediction purposes, has never really been overturned despite much effort at constructing other forecasting models. However, in several other macro and financial forecasting applications, researchers in recent years have considered methods for forecasting that combine the information in a large number of time series. One method that has been found to be remarkably useful for out-of-sample prediction is simple averaging of the forecasts of different models. This often seems to work better than the forecasts from any one model. Bayesian Model Averaging is a closely related method that has also been found to be useful for out-of-sample prediction. This starts out with many possible models and prior beliefs about the probability that each model is the true one. It then involves computing the posterior probability that each model is the true one, and averages the forecasts from the different models, weighting them by these posterior probabilities. This is effectively a shrinkage methodology, but with shrinkage over models not just over parameters. I apply this Bayesian Model Averaging approach to pseudo-out-of-sample exchange rate forecasting over the last ten years. I find that it compares quite favorably to a driftless random walk forecast. Depending on the currency-horizon pair, the Bayesian Model Averaging forecasts sometimes do quite a bit better than the random walk benchmark (in terms of mean square prediction error), while they never do much worse. The forecasts generated by this model averaging methodology are however very close to (but not identical to) those from the random walk forecast. Keywords: Shrinkage, model uncertainty, forecasting, exchange rates, bootstrap. JEL Classification: C32, C53, F31.
* International Finance Division, Board of Governors of the Federal Reserve System, Washington DC 20551. I am grateful to Ben Bernanke, David Bowman, Jon Faust, Matt Pritsker and Pietro Veronesi for helpful comments, and to Sergey Chernenko for excellent research assistance. The views in this paper are solely the responsibility of the author and should not be interpreted as reflecting the views of the Board of Governors of the Federal Reserve System or of any person associated with the Federal Reserve System.
1
1. Introduction.
Out-of-sample forecasting of exchange rates is hard. Meese and Rogoff (1983) argued
that all exchange rate models do less well in out-of-sample forecasting exercises than a
simple driftless random walk. Although this finding was heresy to many at the time that
Meese and Rogoff wrote their seminal paper, it has now become the conventional
wisdom. Mark (1995) claimed that a monetary fundamentals model can generate better
out-of-sample forecasting performance at long horizons, but that result has been found to
be very sensitive to the sample period (Groen (1999), Faust, Rogers and Wright (2003)).
Claims that a particular variable has predictive power for exchange rates crop up
frequently, but these results typically apply just to a particular exchange rate and a
particular subsample. As such, they are by now met with justifiable skepticism and are
thought of by many as the result of data-mining exercises.
Exchange rates are not the only data that are hard to predict. Atkeson and
Ohanian (2001) showed that Phillips-curve based forecasts of inflation give larger out-of-
sample prediction errors than a simple random walk forecast of inflation. Stock and
Watson (2001, 2002a) consider prediction of inflation and output growth in each of the
G7 countries using a large number of possible models. They find that most of the models
they consider give larger out-of-sample root mean square prediction error than a simple
naive time series forecast based on fitting an autoregression to inflation or output growth.
When a model does have predictive power relative to the naive time series forecast, this
tends to be unstable. That is, the model that has good predictive power in one subperiod
has little or no propensity to have good predictive power in another subperiod. The
models that Stock and Watson consider are simple: each model consists of a regression of
2
inflation/output growth on a single leading indicator and a lagged dependent variable.
Heavily parameterized models, with large numbers of variables or capricious nonlinear
specifications can provide extraordinarily good fits in sample, but generally make matters
worse in terms of out-of-sample prediction.
However, in the context of inflation and output growth prediction, researchers
have recently made substantial progress in forecasting using large datasets (i.e. a large
number of predictive variables), but where the information in these different variables is
combined in a judicious way that avoids the estimation of a large number of unrestricted
parameters. Bayesian VARs have been found to be useful in forecasting: these often use
many time series, but impose a prior that many of the coefficients in the VAR are close to
zero. Approaches in which the researcher estimates a small number of factors from a
large dataset and forecasts using these estimated factors have also been shown to be
capable of superior predictive performance (Stock and Watson (2002b) and Bernanke and
Boivin (2003) are among the many possible cites). Stock and Watson (2001, 2002a)
however argue that the best predictive performance is obtained by constructing forecasts
from a very large number of models and simply averaging these forecasts. Stock and
Watson report that this gives the best predictive performance of international output
growth and inflation, and that this is remarkably consistent across subperiods and across
countries. Although the basic idea that forecast combination outperforms any individual
forecast is part of the folklore of economic forecasting, going back to Bates and Granger
(1969), Stock and Watson underscore how consistent this is across time periods and
variables being forecast. It is of course crucial to the result that the researcher just
average the forecasts (or take a median or trimmed mean). It is in particular tempting to
3
run a forecast evaluation regression in which the weights on the different forecasts are
estimated as free parameters. While this leads to a better in-sample fit, it gives less good
out-of-sample prediction.
Stock and Watson (2001, 2002a) do not offer a definitive explanation for why
simple averaging of forecasts does so well, but the finding is sufficiently strong and
general that forecasters ought to pay attention to this result, even without necessarily
understanding exactly what is so effective about this particular form of shrinkage.
In this paper, I plan to use forecast combination methods that have been found to
be useful in other contexts, but to apply them to the problem of out-of-sample exchange
rate prediction. I shall pool forecasts from a large number of different models, to see
whether this idea that has been so successful in the context of output growth and inflation
forecasting makes any dent in the context of exchange rate forecasting. But I shall also
try to apply the closely related idea of Bayesian Model Averaging (which was not
considered by Stock and Watson (2001, 2002a)). Bayesian Model Averaging has been
developed mainly, but not exclusively, by statisticians as opposed to econometricians.
The idea is to consider prediction when the researcher does not know the true model, but
has several candidate models. A forecast can be constructed putting weights on the
predictions from each model. If these weights are all equal, then this is simple forecast
averaging. The researcher can however start from the prior that all the models are
equally good, but then estimate the posterior probabilities of the models, which can be
used as weights instead.
4
The contribution of this paper is to argue that Bayesian Model Averaging may be
useful for out-of-sample forecasting of exchange rates in the 1990s. It seems to work
better than simple equal-weighted model averaging, in this particular context at least.
One does not have to be a subjectivist Bayesian to believe in the usefulness of
Bayesian Model Averaging, or of Bayesian shrinkage techniques more generally. A
frequentist econometrician can interpret these methods as pragmatic smoothing devices
that can be useful for out-of-sample forecasting.
The plan for the remainder of the paper is as follows. In section 2, I shall describe
the idea of Bayesian Model Averaging. The out-of-sample exchange rate prediction
exercise is described in section 3. Using a large number of models, combined using
Bayesian Model Averaging methods, gives promising results for out-of-sample exchange
rate forecasting. Section 4 concludes.
2. Bayesian Model Averaging
The idea of Bayesian Model Averaging was set out by Leamer (1978), but has recently
received a lot of attention in the statistics literature, including in particular Raftery,
Madigan and Hoeting (1997), Hoeting, Madigan, Raftery and Volinsky (1999) and
Chipman, George and McCulloch (2001). It has also been used in a number of
econometric applications, including output growth forecasting (Min and Zellner (1993),
Koop and Potter (2003)), cross-country growth regressions (Doppelhofer, Miller and
Sala-i-Martin (2000) and Fernandez, Ley and Steel (2001)) and stock return prediction
(Avramov (2002) and Cremers (2002)). Avarmov and Cremers both report improved
pseudo-out-of-sample predictive performance from Bayesian model averaging.
5
Consider a set of n models 1,… nM M . The ith model is indexed by a parameter
vector θ – this is a different parameter vector for each model, but for compactness of
notation I do not explicitly subscript θ by i. The researcher knows that one of these
models is the true model, but does not know which one.1 The researcher has prior beliefs
about the probability that the ith model is the true model which we write as ( )iP M ,
observes data D, and updates her beliefs to compute the posterior probability that the ith
model is the true model:
1