Perform the matrix multiplications and write out the equations for \(y_{1.t}\).
Three Useful Distributions
Matrix-Variate Normal Distribution
A \(K\times N\) matrix \(\mathbf{A}\) is said to follow a matrix-variate normal distribution: \[ \mathbf{A} \sim MN_{K\times N}\left( M, Q, P \right), \] where
\(M\) - a \(K\times N\) matrix of the mean
\(Q\) - a \(N\times N\) row-specific covariance matrix
\(P\) - a \(K\times K\) column-specific covariance matrix
if \(\text{vec}(\mathbf{A})\) is multivariate normal: \[ \text{vec}(\mathbf{A}) \sim N_{KN}\left( \text{vec}(M), Q\otimes P \right) \]
Density function.
\[\begin{align*}
MN_{K\times N}\left( M, Q, P \right) &\propto \exp\left\{ -\frac{1}{2}\text{tr}\left[ Q^{-1}(\mathbf{A}-M)'P^{-1}(\mathbf{A}-M) \right] \right\}
\end{align*}\]
\(\text{tr}()\) is a trace of a matrix - a sum of diagonal elements
Inverse Wishart Distribution
An \(N\times N\) square symmetric and positive definite matrix \(\mathbf\Sigma\) follows an inverse Wishart distribution: \[ \mathbf\Sigma \sim IW_{N}\left( S, \nu \right) \] where
\(S\) is \(N\times N\) positive definite symmetric matrix called the scale matrix
\(\nu \geq N\) denotes degrees of freedom, if its density is given by:
\[\begin{align*}
\mathbf{A}|\mathbf\Sigma &\sim MN_{K\times N}\left( M, \mathbf\Sigma, P \right)\\
\mathbf\Sigma &\sim IW_{N}\left( S, \nu \right)
\end{align*}\]
then the joint distribution of \((\mathbf{A},\mathbf\Sigma)\) is normal-inverse Wishart\[
p(\mathbf{A},\mathbf\Sigma) = NIW_{K\times N}\left( M,P,S,\nu\right)
\]
The model assumptions state: \[\begin{align*}
\epsilon_t|Y_{t-1} &\sim iidN_N\left(\mathbf{0}_N,\mathbf\Sigma\right)
\end{align*}\]
Collect error term vectors in a \(T\times N\) matrix: \[\underset{(T\times N)}{E}= \begin{bmatrix}\epsilon_1 & \epsilon_2 & \dots & \epsilon_{T}\end{bmatrix}'\]
Error term matrix is matrix-variate distributed: \[\begin{align*}
E|X &\sim MN_{T\times N}\left(\mathbf{0}_{T\times N},\mathbf\Sigma, I_T\right)
\end{align*}\]
Tasks: what is
the covariance of \(\text{vec}(E)\)
the distribution of the first equation error terms \(\begin{bmatrix}\epsilon_{1.1} &\dots&\epsilon_{1.T}\end{bmatrix}'\)
Example: Univariate Inverse Wishart Distribution
The inverse Wishart density function is proportional to: \[\begin{align*}
\text{det}(\mathbf\Sigma)^{-\frac{\nu+N+1}{2}}\exp\left\{ -\frac{1}{2}\text{tr}\left[ \mathbf\Sigma^{-1} S \right] \right\}
\end{align*}\]
Consider a case where:
\(N=1\)
the matrix \(\mathbf\Sigma\) is replaced by a scalar \(\boldsymbol\sigma^2\)
Task:
write out the kernel of the density function for \(\boldsymbol\sigma^2\)
A natural-conjugate prior leads to joint posterior distribution for \((\mathbf{A},\mathbf{\Sigma})\) of the same form \[\begin{align*}
p\left( \mathbf{A}, \mathbf{\Sigma} \right) &= p\left( \mathbf{A}| \mathbf{\Sigma} \right)p\left( \mathbf{\Sigma} \right)\\
\mathbf{A}|\mathbf{\Sigma} &\sim MN_{K\times N}\left( \underline{A},\mathbf{\Sigma},\underline{V} \right)\\
\mathbf{\Sigma} &\sim IW_N\left( \underline{S}, \underline{\nu} \right)
\end{align*}\]
Derive the full conditional posterior distribution of \(\mathbf{A}\) given \(\mathbf{\Sigma}\) and data \(Y\) and \(X\), denoted by \(p(\mathbf{A}|Y,X,\mathbf{\Sigma})\)
\[ \]
Hint: Proceed by:
writing out the kernel of this distribution as a product of the likelihood and the prior,
collecting all the terms within the exponential function and under the trace operator,
completing the squares,
applying the conditioning whenever convenient.
Posterior Mean of \(\mathbf{A}\)
Posterior mean of matrix \(\mathbf{A}\) is: \[\begin{align*}
\overline{A} &= \overline{V}\left( X'Y + \underline{V}^{-1}\underline{A} \right)\\[2ex]
&= \overline{V}\left( X'X\widehat{A} + \underline{V}^{-1}\underline{A} \right)\\[2ex]
&= \overline{V} X'X\widehat{A} + \overline{V}\underline{V}^{-1}\underline{A}
\end{align*}\] a linear combination of the MLE \(\widehat{A}\) and the prior mean \(\underline{A}\)
For Bayesian VARs the posterior is known \[
p\left( \mathbf{A}, \mathbf{\Sigma}| data \right) = MNIW\left(\overline{A},\overline{V}, \overline{S}, \overline{\nu} \right)
\]
and so is the analytical formula for the MDD: \[p(data)\]
This can be used to our advantage!
Minnesota and Dummy Observations Prior
Minnesota Prior
Sims, Litterman, Doan (1984) proposed an interpretable way of setting the hyper-parameters on the NIW prior \(\underline{A}\), \(\underline{V}\), \(\underline{S}\), and \(\underline{\nu}\) for macroeconomic data.
\[ \] The prior reflects the following stylised facts about macro time series:
the data are unit-root non-stationary
the effect of more lagged variables should be smaller and smaller
the effect of other variables lags should be less than that of own lags
For \(\quad l = 1+\text{floor}((i-1)/N) \quad\text{and }\quad k = i - (l-1)N\), set: \[\begin{align*}
\underline{A} &= \begin{bmatrix} I_N \\ \mathbf{0}_{((p-1)N +1)\times N}\end{bmatrix}&
\underline{V}_{ij} &= \left\{\begin{array} (\lambda ^ 2 / (\psi_k l^2) &\text{ for }i=j,\text{ and } i\neq pN+1 \\
\lambda^2 &\text{ for }i=j,\text{ and } i= pN+1 \\
0&\text{ for } i\neq j
\end{array}\right.
\end{align*}\]
Hyper-parameters.
\(\lambda^2\) has to be chosen (or estimated)
Minnesota Prior
Task.
Consider a simple case of a model for \(N=2\) observations and with \(p=1\) lag.
write out the \(3\times2\) matrix \(\underline{A}\) and the \(3\times3\) matrix \(\underline{V}\) for the Minnesota prior determining every of its elements.
Dummy Observations Prior
Idea.
Generate artificial data matrices with \(T_d\) rows \(Y^*\) and \(X^*\)
Append them to the original data matrices \(Y\) and \(X\) respectively.
Implied prior distribution.
Use Bayes Rule to derive the joint prior of \((\mathbf{A},\mathbf\Sigma)\) given \(Y^*\) and \(X^*\).
\(\mu\) is a hyper-parameter to be chosen (or estimated)
if \(\mu \rightarrow 0\) the prior implies the presence of a unit root in each equation and rules out cointegration
if \(\mu \rightarrow\infty\) the prior becomes uninformative
Dummy Observations Prior
Dummy-initial-observation prior.
Generate an additional row by \[
Y^{++} = \frac{\bar{Y}_0'}{\delta} \quad\text{ and }\quad X^{++} = \begin{bmatrix} Y^{++} & \dots & Y^{++} & \frac{1}{\delta} \end{bmatrix}
\]
hyper-parameter \(\delta\) is to be chosen (or estimated)
if \(\delta \rightarrow 0\) all the variables of the VAR are forced to be at their unconditional mean, or the system is characterized by the presence of an unspecified number of unit roots without drift (cointegration)
if \(\delta \rightarrow\infty\) the prior becomes uninformative
Step 1: Estimate \((\psi,\lambda,\mu,\delta)\) using a random-walk Metropolis-Hastings sampler
Sample these hyper-parameters marginally on \((\mathbf{A},\mathbf\Sigma)\)
extend the conditioning of Marginal Data Density: \[ p(data|\psi,\lambda,\mu,\delta)\]
apply Bayes Rule to obtain the kernel of the posterior:
\[ p(\psi,\lambda,\mu,\delta|data) \propto p(\psi,\lambda,\mu,\delta)p(data|\psi,\lambda,\mu,\delta)\] - Use an \((N+3)\)-variate Student-t distribution as the candidate generating density
Bayesian Estimation for Hierarchical Prior
Step 2: For each draw of \((\psi,\lambda,\mu,\delta)\) sample the corresponding draw of \((\mathbf{A},\mathbf{\Sigma})\)
Use the MNIW posterior derived for the implied prior:
… is implied by the model formulation: \[\begin{align*}
y_{t+h}|Y_{t+h-1},\mathbf{A},\mathbf\Sigma &\sim N_N\left(\mathbf{A}_1 y_{t+h-1} + \dots + \mathbf{A}_p y_{t+h-p} + \boldsymbol\mu_0,\mathbf\Sigma\right)
\end{align*}\]
One-Period Ahead Predictive Density
\(\left.\right.\)
Bayesian forecasting takes into account the uncertainty w.r.t. parameter estimation by integrating it out from the predictive density.
\(p(y_{T+1}|Y_{t},\mathbf{A},\mathbf\Sigma)\) - one-period-ahead conditional predictive density
\(p(\mathbf{A},\mathbf\Sigma|Y,X)\) - marginal posterior distribution
Sampling from One-Period Ahead Predictive Density
\(\left.\right.\)
Step 1: Sample from the posterior
… and obtain \(S\) draws \(\left\{ \mathbf{A}^{(s)},\mathbf\Sigma^{(s)} \right\}_{s=1}^{S}\)
\(\left.\right.\)
Step 2: Sample from the predictive density
In order to obtain draws from \(p(y_{T+1}|Y,X)\), for each of the \(S\) draws of \((\mathbf{A},\mathbf\Sigma)\) sample the corresponding draw of \(y_{T+1}\):
Sample \(y_{T+1}^{(s)}\) from \[
N_N\left(\mathbf{A}_1^{(s)} y_{T} + \dots + \mathbf{A}_p^{(s)} y_{T-p+1} + \boldsymbol\mu_0^{(s)},\mathbf\Sigma^{(s)}\right)
\] and obtain \(\left\{y_{T+1}^{(s)}\right\}_{s=1}^{S}\)
\(h\)-Period Ahead Predictive Density
\(\left.\right.\)
This procedure can be generalised to any forecasting horizon.
… and obtain \(S\) draws \(\left\{ \mathbf{A}^{(s)},\mathbf\Sigma^{(s)} \right\}_{s=1}^{S}\)
Step 2: Sample from 1-period ahead predictive density
For each of the \(S\) draws, sample \(y_{T+1}^{(s)}\) from \[
N_N\left(\mathbf{A}_1^{(s)} y_{T} + \dots + \mathbf{A}_p^{(s)} y_{T-p+1} + \boldsymbol\mu_0^{(s)},\mathbf\Sigma^{(s)}\right)
\]
Step 3: Sample from 2-period ahead predictive density
For each of the \(S\) draws, sample \(y_{T+2}^{(s)}\) from \[
N_N\left(\mathbf{A}_1^{(s)} y_{T+1}^{(s)} + \mathbf{A}_2 y_{T} + \dots + \mathbf{A}_p^{(s)} y_{T-p+2} + \boldsymbol\mu_0^{(s)},\mathbf\Sigma^{(s)}\right)
\]
and obtain \(\left\{y_{T+2}^{(s)},y_{T+1}^{(s)}\right\}_{s=1}^{S}\)
# estimate the modelpost =estimate( spec, S = S,show_progress =FALSE)# forecastfore =forecast(post, horizon =8)# plot the forecastsplot( fore, data_in_plot =0.5, probability =0.68, col ="#F500BD")