stochvolTMB: Likelihood estimation of stochastic volatility

Jens Christian Wahl

The stochastic volatility model

The stochastic volatility model, introduced by Taylor (1982), is defined by \[\begin{equation} \begin{aligned} y_t &= \sigma_y e^{h_t/2} \epsilon_t, \quad t = 1, \dots, T, \\ h_{t+1} &= \phi h_{t} + \sigma_h \eta_t, \quad t = 1, \dots, T-1, \\ \eta_t &\stackrel{\text{iid}}{\sim} \mathcal{N}(0,1), \\ \epsilon_t &\stackrel{\text{iid}} {\sim} F \end{aligned} \end{equation}\] where \(y_t\) is the observed log returns, \(h_t\) is the logarithm of the variance on day \(t\). The distribution of the innovations \(\epsilon_t\) is specified below. To ensure stationarity for \(h_t\), we assume \(|\phi| < 1\). It can be shown that the unconditional distribution of \(h_t\) is \(\mathcal{N}(0,\sigma_h^2/(1 - \phi^2))\), and we assume \(h_1 \sim \mathcal{N}(0,\sigma_h^2/(1-\phi^2))\). An interpretation of the latent process \(\{h_t\}\) is that is represents the random and uneven flow of new information into the marked. For different time points, the variance will be dependent of this unobserved ``flow’’ of information, i.e. conditioning on \(h_t\), \(\mathrm{Var}(y_t | h_t) = \sigma_x^2 e^{h_t}\).

The original SV model from Taylor (1982) assumed normally distributed innovations, but this is usually to strong of an assumption. Financial returns are usually heavy-tailed, might be asymmetric and the volatility can be correlated with the returns. The latter is called the leverage effect, where there is a negative correlation between a change in price and the volatility, meaning that a drop in price tend to lead to an increase in volatility To take these features of financial time series into account stochvolTMB has implemented the following four distribution for the innovations:

stochvolTMB is inspired by the package stochvol (Kastner (2016)), but stochvolTMB obtain parameter estimates through maximum likelihood estimation and not Markov Chain Monte Carlo.

Usage

The main functions of stochvolTMB are:

Function Name Description
estimate_parameters Estimate parameters of a stochastic volatility model.
sim_sv Simulate data from a stochastic volatility model.
plot.stochvolTMB Plot estimated volatility (with confidence intervals).
summary.stochvolTMB Extract parameter estimates and volatility with uncertainty.

estimate_parameters returns an object of class stochvolTMB. The summary function returns a data.table with estimated parameters, log-volatility and transformed parameters along with uncertainties, p-values and z-values. The argument report = "fixed" returns the parameters on the scale they were estimated. This means that the standard deviations \(\sigma_y, \sigma_h\) are given on the log scale. This is done to ensure a positive estimate and \(\phi\) is given on logit scale to ensure \(|\phi| < 1\). If report = c("fixed", "transformed"), parameter estimates transformed back to their original scale is also returned. If you want to extract the estimated log-volatility you can use report = "random".

Example

Estimating parameters in a SV model is easy with stochvolTMB. As an example we investigate the daily log-returns of the S&P500 from 2005 to 2018. A quick look at the data:

data(spy)
plot(spy$date, spy$log_return, type = "l", xlab = "", ylab = "", main = "Log-returns of S&P500")

plot(spy$date, spy$price, type = "l", xlab = "", ylab = "", main = "Price of S&P500")

We fit all four distributions:

gaussian = estimate_parameters(spy$log_return, model = "gaussian", silent = TRUE)
t_dist = estimate_parameters(spy$log_return, model = "t", silent = TRUE)
skew_gaussian = estimate_parameters(spy$log_return, model = "skew_gaussian", silent = TRUE)
leverage = estimate_parameters(spy$log_return, model = "leverage", silent = TRUE)

We can investigate the estimate for the degrees of freedom (df) to see if the returns are heavy-tailed

summary(t_dist, report = "fixed")
#>      parameter  estimate  std_error    z_value      p_value  type
#> 1: log_sigma_y -4.780371 0.10353121 -46.173236 0.000000e+00 fixed
#> 2: log_sigma_h -1.683263 0.09806688 -17.164443 4.901588e-66 fixed
#> 3:   phi_logit  4.880275 0.26254873  18.588072 4.013461e-77 fixed
#> 4:          df 10.086369 2.10224617   4.797901 1.603371e-06 fixed

Clearly the returns are more heavy tailed than Gaussian, even when controlling for the stochastic volatility. We can also check for asymmetric returns

summary(skew_gaussian, report = "fixed")
#>      parameter  estimate  std_error    z_value      p_value  type
#> 1: log_sigma_y -4.797828 0.09127159 -52.566502 0.000000e+00 fixed
#> 2: log_sigma_h -1.556355 0.08899553 -17.488015 1.768097e-68 fixed
#> 3:   phi_logit  4.623512 0.23363739  19.789263 3.683862e-87 fixed
#> 4:       alpha -1.088827 0.14275943  -7.627006 2.402689e-14 fixed

and leverage (rho)

summary(leverage, report = "transformed")
#>    parameter     estimate    std_error   z_value       p_value        type
#> 1:   sigma_y  0.008338425 0.0004163323  20.02829  3.121840e-89 transformed
#> 2:   sigma_h  0.273445746 0.0182642070  14.97167  1.124552e-50 transformed
#> 3:       phi  0.967720808 0.0043682334 221.53597  0.000000e+00 transformed
#> 4:       rho -0.748692587 0.0322489934 -23.21600 3.138829e-119 transformed

There is clear evidence for both asymmetric returns and a negative correlation (of -0.74!) between log-returns and the volatility. To find the model that fits the data best, we can compare the AIC of out models and pick the smallest.


AIC(gaussian, 
    t_dist, 
    skew_gaussian, 
    leverage)
#>               df       AIC
#> gaussian       3 -23430.57
#> t_dist         4 -23451.69
#> skew_gaussian  4 -23440.87
#> leverage       4 -23608.85

Clearly the leverage model outperforms the others and is our preferred model for this dataset. Lastly, we can also plot the estimated log-volatility and percentage volatility

plot(leverage, include_ci = TRUE, plot_log = TRUE, dates = spy$date)

plot(leverage, include_ci = TRUE, plot_log = FALSE, dates = spy$date)

Comparison to stochvol

The R-package stochvol (Kastner (2016)) provides a Bayesian framework for inference using Markov Chain Monte Carlo. An advantage of stochvolTMB is that optimization can be a lot faster than MCMC. We here compare the leverage and the gaussian model. Depending on your machine you can expect a speed up of 20-200x.

library(stochvol)

stochvol_gauss <- svsample(spy$log_return, quiet = T)
stochvolTMB_gauss  <- estimate_parameters(spy$log_return, "gaussian", silent = TRUE)

stochvol_lev <- svlsample(spy$log_return, quiet = T)
stochvolTMB_lev  <- estimate_parameters(spy$log_return, "leverage", silent = TRUE)

We can compare the parameter estimates of the two methods. Note that the parameter exp(mu/2) and sigma from stochvol is the same as sigma_y and sigma_h from stochvolTMB. Both methods give almost identical results.


stochvol_gauss$para
#>                   mean           sd          5%          50%          95%
#> mu        -9.611533785 0.1854455626 -9.91175182 -9.611774497 -9.304678419
#> phi        0.978189757 0.0049148914  0.96968612  0.978414831  0.985836137
#> sigma      0.229775055 0.0203786934  0.19733459  0.228942708  0.264098047
#> exp(mu/2)  0.008217761 0.0007671236  0.00704191  0.008181439  0.009539261
#> sigma^2    0.053211825 0.0094502446  0.03894094  0.052414764  0.069747778
#>                 ESS
#> mu        6230.1953
#> phi        327.9347
#> sigma      152.4001
#> exp(mu/2) 6230.1953
#> sigma^2    152.4001
summary(stochvolTMB_gauss, report = "transformed")
#>    parameter    estimate    std_error   z_value      p_value        type
#> 1:   sigma_y 0.008185065 0.0007314678  11.18992 4.568554e-29 transformed
#> 2:   sigma_h 0.222436260 0.0190054225  11.70383 1.218261e-31 transformed
#> 3:       phi 0.979034580 0.0046594721 210.11706 0.000000e+00 transformed
stochvol_lev$para
#>                   mean           sd           5%          50%          95%
#> mu        -9.569375399 0.1053851888 -9.739817718 -9.571446103 -9.400002030
#> phi        0.966952934 0.0041187166  0.960195391  0.966853189  0.973557574
#> sigma      0.276351629 0.0172514582  0.248100974  0.277044222  0.304057250
#> rho       -0.724949416 0.0328632332 -0.775144574 -0.726905055 -0.667578807
#> exp(mu/2)  0.008368349 0.0004417329  0.007674065  0.008348085  0.009095268
#> sigma^2    0.076667806 0.0095068932  0.061554093  0.076753501  0.092450811
#>                 ESS
#> mu        368.69311
#> phi       140.52279
#> sigma      42.20112
#> rho        26.56514
#> exp(mu/2) 368.69311
#> sigma^2    42.20112
summary(stochvolTMB_lev, report = "transformed")
#>    parameter     estimate    std_error   z_value       p_value        type
#> 1:   sigma_y  0.008338425 0.0004163323  20.02829  3.121840e-89 transformed
#> 2:   sigma_h  0.273445746 0.0182642070  14.97167  1.124552e-50 transformed
#> 3:       phi  0.967720808 0.0043682334 221.53597  0.000000e+00 transformed
#> 4:       rho -0.748692587 0.0322489934 -23.21600 3.138829e-119 transformed

Estimation

The R-package TMB (Kristensen et al. (2016)) is used to implement our models for maximum likelihood estimation, since TMB lets us estimate parameters in models with a high number of latent variables.

Parameter estimation of stochastic volatility models is hard due to the fact the likelihood function is expressed as a high dimensional integral over the latent variables that cannot be evaluated analytically. If \(\boldsymbol{y} = (y_1, \ldots, y_T)\) denotes our observations, \(\boldsymbol{h} = (h_1, \ldots, h_T)\) our latent variables and \(\boldsymbol{\theta}\) the parameters of interest, the likelihood of \(\boldsymbol{\theta}\) is given by

\[\begin{equation} \mathcal{L}(\boldsymbol{\theta}) = \int f_{\boldsymbol{y}}(\boldsymbol{y}|\boldsymbol{h})f_{\boldsymbol{h}}(\boldsymbol{h}) \, d\boldsymbol{h}, \end{equation}\]

The conditional density of our observations given \(\boldsymbol{h}\) is denoted by \(f_{\boldsymbol{y}}(\boldsymbol{y|u})\), and \(f_{\boldsymbol{h}}(\boldsymbol{h})\) denotes the marginal density of \(\boldsymbol{h}\). To approximate this integral we apply the Laplace approximation.

Laplace Approximation

Let \(\boldsymbol{y}\) be a vector of observations, \(\boldsymbol{\theta}\) our parameters of interest and \(\boldsymbol{h}\) be a random vector of latent variables. Let \(g(\boldsymbol{h},\boldsymbol{\theta})\) denote the negative joint log-likelihood. The likelihood of \(\boldsymbol{\theta}\) is given by

\[\begin{equation} \mathcal{L}(\boldsymbol{\theta}) = \int f(\boldsymbol{y},\boldsymbol{h}) \, d\boldsymbol{h} = \int f_{\boldsymbol{y}}(\boldsymbol{y|u}) f_{\boldsymbol{h}}(\boldsymbol{h}) \, d\boldsymbol{h} = \int \exp \{-g(\boldsymbol{h},\boldsymbol{\theta})\} \, d\boldsymbol{h}. \end{equation}\]

We assume that \(g\) has a global minimum at \(\boldsymbol{\hat{h}}\) for a given \(\boldsymbol{\theta}\), i.e. \(\boldsymbol{\hat{h}} = \text{argmin}_{\boldsymbol{h}} g(\boldsymbol{h},\boldsymbol{\theta})\), and that \(g\) is twice differentiable. The solution \(\hat{\boldsymbol{h}}\) is known as the Empirical Bayes (EB) estimate. A second order Taylor expansion around \(\boldsymbol{\hat{h}}\) yields

\[\begin{equation} g(\boldsymbol{h},\boldsymbol{\theta}) \approx g(\boldsymbol{\hat{h}},\boldsymbol{\theta}) + \nabla g(\boldsymbol{\hat{h}},\boldsymbol{\theta})(\boldsymbol{h} - \boldsymbol{\hat{h}}) + \frac{1}{2}(\boldsymbol{h} - \boldsymbol{\hat{h}})^T\mathbb{H}(\boldsymbol{h} - \boldsymbol{\hat{h}}) \end{equation}\]

Since \(\boldsymbol{\hat{h}}\) is a minimum, \(\nabla g(\boldsymbol{\hat{h}},\boldsymbol{\theta}) = 0\). Therefore

\[\begin{equation} \mathcal{L}(\boldsymbol{\theta}) \approx \exp \{-g(\boldsymbol{\hat{h}},\boldsymbol{\theta})\} \int \exp \bigg \{-\frac{1}{2}(\boldsymbol{h} - \boldsymbol{\hat{h}})^T\mathbb{H}(\boldsymbol{h} - \boldsymbol{\hat{h}}) \bigg \} d\boldsymbol{h} \end{equation}\]

We can observe that the integrand is the kernel of a multivariate normal density with covariance matrix \(\mathbb{H}^{-1}\). The approximation is therefore given by

\[\begin{equation} \mathcal{L}(\boldsymbol{\theta}) \approx \exp \{-g(\boldsymbol{\hat{h}},\boldsymbol{\theta})\} (2\pi)^{\text{dim}(\boldsymbol{h})/2} \text{det}(\mathbb{H})^{-1/2}, \end{equation}\]

where we have used the fact that \(\text{det}(\mathbb{H}^{-1}) = \text{det}(\mathbb{H})^{-1}\). The corresponding negative log-likelihood is

\[\begin{equation} -l(\boldsymbol{\theta}) = -\frac{\text{dim}(\boldsymbol{h})}{2} \log (2\pi) + \frac{1}{2} \log \text{det}(\mathbb{H}) + g(\boldsymbol{\hat{h}},\boldsymbol{\theta}). \end{equation}\]

Finding the optimal value of \(\boldsymbol{\theta}\) can be viewed as a nested optimization problem. To find \(\boldsymbol{h} (\boldsymbol{\theta})\) and \(\mathbb{H}(\boldsymbol{\theta})\) we fix \(\boldsymbol{\theta}\) and optimize using a quasi-Newton algorithm or a limited memory Newton method. The Laplace approximation is then optimized w.r.t. \(\boldsymbol{\theta}\) using the quasi-Newton algorithm.

References

Kastner, Gregor. 2016. “Dealing with Stochastic Volatility in Time Series Using the R Package stochvol.” Journal of Statistical Software 69 (5): 1–30. https://doi.org/10.18637/jss.v069.i05.

Kristensen, Kasper, Anders Nielsen, Casper Berg, Hans Skaug, and Bradley Bell. 2016. “TMB: Automatic Differentiation and Laplace Approximation.” Journal of Statistical Software, Articles 70 (5): 1–21. https://doi.org/10.18637/jss.v070.i05.

Taylor, S. J. 1982. “Financial Returns Modelled by the Product of Two Stochastic Processes-a Study of the Daily Sugar Prices 1961-75.” Time Series Analysis : Theory and Practice 1: 203–26. https://ci.nii.ac.jp/naid/10018822959/en/.