项目作者: loelschlaeger

项目描述 :
Fitting (hierarchical) hidden Markov models to financial data.
高级语言: R
项目地址: git://github.com/loelschlaeger/fHMM.git
创建时间: 2019-11-04T14:41:52Z
项目社区:https://github.com/loelschlaeger/fHMM

开源协议:

下载


HMMs for Finance

CRAN
status
metacran
downloads
R-CMD-check
Lifecycle:
stable
Codecov test
coverage

The {fHMM} R package allows for the detection and characterization of
financial market regimes in time series data by applying hidden Markov
Models (HMMs). The vignettes
outline the package functionality and the model formulation.

For a reference on the method, see:

Oelschläger, L., and Adam, T. 2021. “Detecting Bearish and Bullish
Markets in Financial Time Series Using Hierarchical Hidden Markov
Models.” Statistical Modelling.
https://doi.org/10.1177/1471082X211034048

A user guide is provided by the accompanying software paper:

Oelschläger, L., Adam, T., and Michels, R. 2024. “fHMM: Hidden Markov
Models for Financial Time Series in R”. Journal of Statistical
Software. https://doi.org/10.18637/jss.v109.i09

Below, we illustrate an application to the German stock index
DAX. We also show how to use the
package to simulate HMM data, compute the model likelihood, and decode
the hidden states using the Viterbi algorithm.

Installation

You can install the released package version from
CRAN with:

  1. install.packages("fHMM")

Contributing

We are open to contributions and would appreciate your input:

  • If you encounter any issues, please submit bug reports as
    issues
    .

  • If you have any ideas for new features, please submit them as feature
    requests
    .

  • If you would like to add extensions to the package, please fork the
    master branch and submit a merge request.

Example: Fitting an HMM to the DAX

We fit a 3-state HMM with state-dependent t-distributions to the DAX
log-returns from 2000 to 2022. The states can be interpreted as proxies
for bearish (green below) and bullish markets (red) and an “in-between”
market state (yellow).

  1. library("fHMM")

The package has a build-in function to download financial data from
Yahoo Finance:

  1. dax <- download_data(symbol = "^GDAXI")

We first need to define the model:

  1. controls <- set_controls(
  2. states = 3,
  3. sdds = "t",
  4. file = dax,
  5. date_column = "Date",
  6. data_column = "Close",
  7. logreturns = TRUE,
  8. from = "2000-01-01",
  9. to = "2022-12-31"
  10. )

The function prepare_data() then prepares the data for estimation:

  1. data <- prepare_data(controls)

The summary() method gives an overview:

  1. summary(data)
  2. #> Summary of fHMM empirical data
  3. #> * number of observations: 5882
  4. #> * data source: data.frame
  5. #> * date column: Date
  6. #> * log returns: TRUE

We fit the model and subsequently decode the hidden states and compute
(pseudo-) residuals:

  1. model <- fit_model(data)
  2. model <- decode_states(model)
  3. model <- compute_residuals(model)

The summary() method gives an overview of the model fit:

  1. summary(model)
  2. #> Summary of fHMM model
  3. #>
  4. #> simulated hierarchy LL AIC BIC
  5. #> 1 FALSE FALSE 17650.02 -35270.05 -35169.85
  6. #>
  7. #> State-dependent distributions:
  8. #> t()
  9. #>
  10. #> Estimates:
  11. #> lb estimate ub
  12. #> Gamma_2.1 2.754e-03 5.024e-03 9.110e-03
  13. #> Gamma_3.1 2.808e-16 2.781e-16 2.739e-16
  14. #> Gamma_1.2 1.006e-02 1.839e-02 3.338e-02
  15. #> Gamma_3.2 1.514e-02 2.446e-02 3.927e-02
  16. #> Gamma_1.3 5.596e-17 5.549e-17 5.464e-17
  17. #> Gamma_2.3 1.196e-02 1.898e-02 2.993e-02
  18. #> mu_1 -3.862e-03 -1.793e-03 2.754e-04
  19. #> mu_2 -7.994e-04 -2.649e-04 2.696e-04
  20. #> mu_3 9.642e-04 1.272e-03 1.579e-03
  21. #> sigma_1 2.354e-02 2.586e-02 2.840e-02
  22. #> sigma_2 1.225e-02 1.300e-02 1.380e-02
  23. #> sigma_3 5.390e-03 5.833e-03 6.312e-03
  24. #> df_1 5.550e+00 1.084e+01 2.116e+01
  25. #> df_2 6.785e+00 4.866e+01 3.489e+02
  26. #> df_3 3.973e+00 5.248e+00 6.934e+00
  27. #>
  28. #> States:
  29. #> decoded
  30. #> 1 2 3
  31. #> 704 2926 2252
  32. #>
  33. #> Residuals:
  34. #> Min. 1st Qu. Median Mean 3rd Qu. Max.
  35. #> -3.517900 -0.664018 0.012170 -0.003262 0.673180 3.693568

Having estimated the model, we can visualize the state-dependent
distributions and the decoded time series:

  1. events <- fHMM_events(
  2. list(dates = c("2001-09-11", "2008-09-15", "2020-01-27"),
  3. labels = c("9/11 terrorist attack", "Bankruptcy Lehman Brothers", "First COVID-19 case Germany"))
  4. )
  5. plot(model, plot_type = c("sdds","ts"), events = events)

The (pseudo-) residuals help to evaluate the model fit:

  1. plot(model, plot_type = "pr")

Simulating HMM data

The {fHMM} package supports data simulation from an HMM and access to
the model likelihood function for model fitting and the Viterbi
algorithm for state decoding.

  1. As an example, we consider a 2-state HMM with state-dependent Gamma
    distributions and a time horizon of 1000 data points.
  1. controls <- set_controls(
  2. states = 2,
  3. sdds = "gamma",
  4. horizon = 1000
  5. )
  1. Define the model parameters via the fHMM_parameters() function
    (unspecified parameters would be set at random).
  1. par <- fHMM_parameters(
  2. controls = controls,
  3. Gamma = matrix(c(0.95, 0.05, 0.05, 0.95), 2, 2),
  4. mu = c(1, 3),
  5. sigma = c(1, 3)
  6. )
  1. Simulate data points from this model via the simulate_hmm()
    function.
  1. sim <- simulate_hmm(
  2. controls = controls,
  3. true_parameters = par
  4. )
  5. plot(sim$data, col = sim$markov_chain, type = "b")

  1. The log-likelihood function ll_hmm() is evaluated at the
    identified and unconstrained parameter values, they can be derived
    via the par2parUncon() function.
  1. (parUncon <- par2parUncon(par, controls))
  2. #> gammasUncon_21 gammasUncon_12 muUncon_1 muUncon_2 sigmaUncon_1
  3. #> -2.944439 -2.944439 0.000000 1.098612 0.000000
  4. #> sigmaUncon_2
  5. #> 1.098612
  6. #> attr(,"class")
  7. #> [1] "parUncon" "numeric"

Note that this transformation takes care of the restrictions, that
Gamma must be a transition probability matrix (which we can ensure via
the logit link) and that mu and sigma must be positive (an
assumption for the Gamma distribution, which we can ensure via the
exponential link).

  1. ll_hmm(parUncon, sim$data, controls)
  2. #> [1] -1620.515
  1. ll_hmm(parUncon, sim$data, controls, negative = TRUE)
  2. #> [1] 1620.515
  1. For maximum likelihood estimation of the model parameters, we can
    numerically optimize ll_hmm() over parUncon (or rather minimize
    the negative log-likelihood).
  1. optimization <- nlm(
  2. f = ll_hmm, p = parUncon, observations = sim$data, controls = controls, negative = TRUE
  3. )
  4. (estimate <- optimization$estimate)
  5. #> [1] -3.46338992 -3.44065582 0.05999848 1.06452907 0.11517811 1.07946252
  1. To interpret the estimate, it needs to be back transformed to the
    constrained parameter space via the parUncon2par() function. The
    state-labeling is not identified.
  1. class(estimate) <- "parUncon"
  2. estimate <- parUncon2par(estimate, controls)
  3. par$Gamma
  4. #> state_1 state_2
  5. #> state_1 0.95 0.05
  6. #> state_2 0.05 0.95
  1. estimate$Gamma
  2. #> state_1 state_2
  3. #> state_1 0.96895125 0.03104875
  4. #> state_2 0.03037204 0.96962796
  1. par$mu
  2. #> muCon_1 muCon_2
  3. #> 1 3
  1. estimate$mu
  2. #> muCon_1 muCon_2
  3. #> 1.061835 2.899473
  1. par$sigma
  2. #> sigmaCon_1 sigmaCon_2
  3. #> 1 3
  1. estimate$sigma
  2. #> sigmaCon_1 sigmaCon_2
  3. #> 1.122073 2.943097