In [ ]:

remove-cell

Copied!

import logging
import warnings

warnings.filterwarnings("ignore")
logging.getLogger("pytensor").setLevel(logging.ERROR)
import logging
import warnings

warnings.filterwarnings("ignore")
logging.getLogger("pytensor").setLevel(logging.ERROR)

Fitting Your First Bayesian VAR¶

A vector autoregression (VAR) models multiple time series jointly, capturing how each variable depends on its own past values and the past values of all other variables in the system. This makes it a natural tool for macroeconomic analysis, where GDP, inflation, and interest rates evolve together. Bayesian estimation adds regularisation through prior distributions -- critical when the number of parameters grows quickly with lags and variables -- and provides full posterior uncertainty over every coefficient and forecast. For more background, see the Bayesian VAR explanation.

In [ ]:

Copied!





import arviz as az
import numpy as np
import pandas as pd

from impulso import VAR, VARData, select_lag_order
from impulso.samplers import NUTSSampler
import arviz as az
import numpy as np
import pandas as pd

from impulso import VAR, VARData, select_lag_order
from impulso.samplers import NUTSSampler

Simulate a small macro economy¶

We simulate a VAR(1) with three variables -- quarterly GDP growth, inflation, and a short-term interest rate -- so that we know the true coefficients and can check whether the model recovers them.

The true coefficient matrix A_true embeds a simple macroeconomic story:

Variable	Own lag	Cross-variable effects
GDP growth	Persistent at 0.6	Reacts negatively to last quarter's interest rate (-0.1)
Inflation	Persistent at 0.5	Follows GDP with a one-quarter lag (0.2)
Interest rate	Persistent at 0.4	Reacts to inflation (0.15) -- a simplified Taylor-rule channel

In [ ]:

Copied!





rng = np.random.default_rng(42)
T = 200
n_vars = 3

# True VAR(1) coefficient matrix
# Rows: [gdp_growth, inflation, rate]
# Columns: [gdp_growth_lag1, inflation_lag1, rate_lag1]
A_true = np.array([
    [0.6, 0.0, -0.1],  # GDP: persistent, negatively affected by rate
    [0.2, 0.5, 0.0],  # Inflation: follows GDP, persistent
    [0.0, 0.15, 0.4],  # Rate: reacts to inflation (Taylor rule), persistent
])

y = np.zeros((T, n_vars))
for t in range(1, T):
    y[t] = A_true @ y[t - 1] + rng.standard_normal(n_vars) * 0.1
rng = np.random.default_rng(42)
T = 200
n_vars = 3

# True VAR(1) coefficient matrix
# Rows: [gdp_growth, inflation, rate]
# Columns: [gdp_growth_lag1, inflation_lag1, rate_lag1]
A_true = np.array([
    [0.6, 0.0, -0.1],  # GDP: persistent, negatively affected by rate
    [0.2, 0.5, 0.0],  # Inflation: follows GDP, persistent
    [0.0, 0.15, 0.4],  # Rate: reacts to inflation (Taylor rule), persistent
])

y = np.zeros((T, n_vars))
for t in range(1, T):
    y[t] = A_true @ y[t - 1] + rng.standard_normal(n_vars) * 0.1

Load data into VARData¶

VARData is the entry point for all data in Impulso. It validates the shape of your arrays, checks for NaN and Inf values, and makes the underlying arrays read-only so that data cannot be accidentally mutated after construction. If you already have a pandas DataFrame, you can use VARData.from_df() instead.

In [ ]:

Copied!

index = pd.date_range("2000-01-01", periods=T, freq="QS")
data = VARData(endog=y, endog_names=["gdp_growth", "inflation", "rate"], index=index)
index = pd.date_range("2000-01-01", periods=T, freq="QS")
data = VARData(endog=y, endog_names=["gdp_growth", "inflation", "rate"], index=index)

Select lag order¶

Before estimating the Bayesian VAR, we need to choose the number of lags. select_lag_order fits OLS VARs at each candidate lag length and computes three information criteria:

AIC (Akaike): tends to overfit by selecting too many lags.
BIC (Bayesian / Schwarz): penalises complexity more heavily and is conservative.
HQ (Hannan-Quinn): sits between AIC and BIC in penalisation strength.

Since the true DGP is VAR(1), BIC should recover lag 1.

In [ ]:

Copied!

ic = select_lag_order(data, max_lags=8)
print(f"AIC selects {ic.aic} lag(s), BIC selects {ic.bic} lag(s), HQ selects {ic.hq} lag(s)")
ic.summary()
ic = select_lag_order(data, max_lags=8)
print(f"AIC selects {ic.aic} lag(s), BIC selects {ic.bic} lag(s), HQ selects {ic.hq} lag(s)")
ic.summary()

BIC selects 1 lag -- correctly recovering the true DGP order. The criteria table above shows the penalty-adjusted fit at each lag. BIC's stronger complexity penalty makes it a sensible default when the goal is parsimony.

Specify the model¶

VAR(lags=1, prior="minnesota") creates a model specification. The Minnesota prior, introduced by Doan, Impulso, and Sims (1984), shrinks each variable's own lags toward a random walk and cross-variable lags toward zero. This regularisation is especially valuable when the number of parameters grows quadratically with the number of variables and linearly with lags. For more detail, see the Minnesota prior explanation.

In [ ]:

Copied!

spec = VAR(lags=1, prior="minnesota")
spec
spec = VAR(lags=1, prior="minnesota")
spec

Estimate the model¶

Calling .fit() builds a PyMC model under the hood, places the Minnesota prior on the coefficient matrix, and samples the posterior using NUTS (the No-U-Turn Sampler). We use cores=1 to avoid multiprocessing issues and a fixed random_seed for reproducibility. More draws improve the posterior approximation but take longer.

In [ ]:

Copied!

sampler = NUTSSampler(draws=500, tune=500, chains=2, cores=1, random_seed=42)
fitted = spec.fit(data, sampler=sampler)
fitted
sampler = NUTSSampler(draws=500, tune=500, chains=2, cores=1, random_seed=42)
fitted = spec.fit(data, sampler=sampler)
fitted

Inspect the posterior¶

The fitted model stores the full posterior in ArviZ InferenceData format. az.summary() shows posterior means, standard deviations, and highest density intervals for each parameter. We can check whether the posterior recovers the true DGP coefficients.

In [ ]:

Copied!

az.summary(fitted.idata, var_names=["B", "intercept"])
az.summary(fitted.idata, var_names=["B", "intercept"])

The posterior means for the B coefficients should be close to the true values in A_true. For example, the GDP-on-GDP-lag coefficient should be near 0.6 and the GDP-on-rate-lag coefficient near -0.1. The 94% HDIs give you a credible range -- if they contain the true value, the model is well calibrated. The intercepts should be near zero since the DGP has no intercept.

What's next¶

With a fitted model in hand, you can:

Forecast: produce probabilistic multi-step-ahead forecasts -- see the Forecasting tutorial
Identify structural shocks: apply Cholesky or sign-restriction identification to study causal impulse responses -- see the Structural Analysis tutorial