import logging
import warnings
warnings.filterwarnings("ignore")
logging.getLogger("pytensor").setLevel(logging.ERROR)
Fitting Your First Bayesian VAR¶
A vector autoregression (VAR) models multiple time series jointly, capturing how each variable depends on its own past values and the past values of all other variables in the system. This makes it a natural tool for macroeconomic analysis, where GDP, inflation, and interest rates evolve together. Bayesian estimation adds regularisation through prior distributions -- critical when the number of parameters grows quickly with lags and variables -- and provides full posterior uncertainty over every coefficient and forecast. For more background, see the Bayesian VAR explanation.
import arviz as az
import numpy as np
import pandas as pd
from impulso import VAR, VARData, select_lag_order
from impulso.samplers import NUTSSampler
Simulate a small macro economy¶
We simulate a VAR(1) with three variables -- quarterly GDP growth, inflation, and a short-term interest rate -- so that we know the true coefficients and can check whether the model recovers them.
The true coefficient matrix A_true embeds a simple macroeconomic story:
| Variable | Own lag | Cross-variable effects |
|---|---|---|
| GDP growth | Persistent at 0.6 | Reacts negatively to last quarter's interest rate (-0.1) |
| Inflation | Persistent at 0.5 | Follows GDP with a one-quarter lag (0.2) |
| Interest rate | Persistent at 0.4 | Reacts to inflation (0.15) -- a simplified Taylor-rule channel |
rng = np.random.default_rng(42)
T = 200
n_vars = 3
# True VAR(1) coefficient matrix
# Rows: [gdp_growth, inflation, rate]
# Columns: [gdp_growth_lag1, inflation_lag1, rate_lag1]
A_true = np.array([
[0.6, 0.0, -0.1], # GDP: persistent, negatively affected by rate
[0.2, 0.5, 0.0], # Inflation: follows GDP, persistent
[0.0, 0.15, 0.4], # Rate: reacts to inflation (Taylor rule), persistent
])
y = np.zeros((T, n_vars))
for t in range(1, T):
y[t] = A_true @ y[t - 1] + rng.standard_normal(n_vars) * 0.1
Load data into VARData¶
VARData is the entry point for all data in Impulso. It validates the shape of
your arrays, checks for NaN and Inf values, and makes the underlying arrays read-only
so that data cannot be accidentally mutated after construction. If you already have a
pandas DataFrame, you can use VARData.from_df() instead.
index = pd.date_range("2000-01-01", periods=T, freq="QS")
data = VARData(endog=y, endog_names=["gdp_growth", "inflation", "rate"], index=index)
Select lag order¶
Before estimating the Bayesian VAR, we need to choose the number of lags.
select_lag_order fits OLS VARs at each candidate lag length and computes three
information criteria:
- AIC (Akaike): tends to overfit by selecting too many lags.
- BIC (Bayesian / Schwarz): penalises complexity more heavily and is conservative.
- HQ (Hannan-Quinn): sits between AIC and BIC in penalisation strength.
Since the true DGP is VAR(1), BIC should recover lag 1.
ic = select_lag_order(data, max_lags=8)
print(f"AIC selects {ic.aic} lag(s), BIC selects {ic.bic} lag(s), HQ selects {ic.hq} lag(s)")
ic.summary()
BIC selects 1 lag -- correctly recovering the true DGP order. The criteria table above shows the penalty-adjusted fit at each lag. BIC's stronger complexity penalty makes it a sensible default when the goal is parsimony.
Specify the model¶
VAR(lags=1, prior="minnesota") creates a model specification. The Minnesota prior,
introduced by Doan, Impulso, and Sims (1984), shrinks each variable's own lags toward
a random walk and cross-variable lags toward zero. This regularisation is especially
valuable when the number of parameters grows quadratically with the number of variables
and linearly with lags. For more detail, see the
Minnesota prior explanation.
spec = VAR(lags=1, prior="minnesota")
spec
Estimate the model¶
Calling .fit() builds a PyMC model under the hood, places the Minnesota prior on the
coefficient matrix, and samples the posterior using NUTS (the No-U-Turn Sampler). We
use cores=1 to avoid multiprocessing issues and a fixed random_seed for
reproducibility. More draws improve the posterior approximation but take longer.
sampler = NUTSSampler(draws=500, tune=500, chains=2, cores=1, random_seed=42)
fitted = spec.fit(data, sampler=sampler)
fitted
Inspect the posterior¶
The fitted model stores the full posterior in ArviZ InferenceData format.
az.summary() shows posterior means, standard deviations, and highest density
intervals for each parameter. We can check whether the posterior recovers the true
DGP coefficients.
az.summary(fitted.idata, var_names=["B", "intercept"])
The posterior means for the B coefficients should be close to the true values in
A_true. For example, the GDP-on-GDP-lag coefficient should be near 0.6 and the
GDP-on-rate-lag coefficient near -0.1. The 94% HDIs give you a credible range -- if
they contain the true value, the model is well calibrated. The intercepts should be
near zero since the DGP has no intercept.
What's next¶
With a fitted model in hand, you can:
- Forecast: produce probabilistic multi-step-ahead forecasts -- see the Forecasting tutorial
- Identify structural shocks: apply Cholesky or sign-restriction identification to study causal impulse responses -- see the Structural Analysis tutorial