项目作者: yixuan

项目描述 :
Gradient-based Fantope projection and selection algorithm for sparse PCA
高级语言: C
项目地址: git://github.com/yixuan/gradfps.git
创建时间: 2019-01-08T21:52:31Z
项目社区:https://github.com/yixuan/gradfps

开源协议:

下载


GradFPS

The gradfps R package implements the gradient-based Fantope projection and selection
algorithm, a convex formulation of sparse principal component analysis. The algorithm
is based on the paper
Gradient-based Sparse Principal Component Analysis with Extensions to Online Learning
by Yixuan Qiu, Jing Lei, and Kathryn Roeder.

Installation

Currently gradfps has not been submitted to CRAN, but it can be installed just like
any other R packages hosted on GitHub. For devtools users, the following command
should work on most platforms:

  1. library(devtools)
  2. install_github("yixuan/gradfps")

Note that a C++ compiler that supports the C++11 standard is needed.
For best performance, it is strongly suggested linking your R to the
OpenBLAS library for matrix computation.
You can achieve this with the help of the
ropenblas package.

Example

Here we compare our implementation with the original ADMM-based algorithm
(via the package fps developed by
Vincent Vu).

Note that for a fair comparison, I added a new function fps_benchmark() to the
original fps package, and the example below requires the installation of my
modified version:

  1. devtools::install_github("yixuan/fps")

And then run the example code:

  1. library(gradfps)
  2. library(fps) # The modified version
  3. library(RSpectra)
  4. library(mvtnorm)
  5. library(Matrix)
  6. library(ggplot2)
  7. n = 200
  8. p = 800
  9. d1 = 20 # first group
  10. d2 = 15 # second group
  11. # Generate covariance matrices
  12. # Simulate eigenvectors
  13. set.seed(123)
  14. ev = matrix(rnorm(p^2), p, p)
  15. ev[, 1:2] = 0
  16. ev[1:d1, 1] = runif(d1, 0.9 / sqrt(d1), 1.1 / sqrt(d1))
  17. ev[(d1 + 1):(d1 + d2), 2] = runif(d2, 0.9 / sqrt(d2), 1.1 / sqrt(d2))
  18. ev = qr.Q(qr(ev))
  19. # Simulate eigenvalues
  20. sigmas = c(12, 6, runif(p - 2, 0, 2))
  21. # True covariance
  22. Sigma = ev %*% diag(sigmas) %*% t(ev)
  23. # Visualization of the true covariance matrix
  24. view_matrix(Sigma[1:100, 1:100], legend_title = "True\nCovariance\nMatrix")

True covariance matrix

  1. # Eigenvectors
  2. d = 5
  3. V = eigs_sym(Sigma, d)$vectors
  4. Pi = tcrossprod(V[, 1:2])
  5. # Visualization of the true eigenvectors
  6. view_evec(-V[1:200, ], asp = 0.4, bar_height = 8)

True eigenvectors

  1. # Generate data
  2. set.seed(123)
  3. z = rmvnorm(n, sigma = Sigma)
  4. Smat = crossprod(z) / n
  5. d = 2
  6. lambda = 0.5 * sqrt(log(p) / n)
  7. e = eigs_sym(Smat, d, which = "LA")
  8. # Initial value, should be noisy
  9. x0 = tcrossprod(e$vectors)
  10. # ADMM FPS
  11. res_fps = fps::fps_benchmark(
  12. Smat, d, lambda, x0, Pi, rho = -1, maxiter = 60, tolerance = 1e-3, verbose = 0
  13. )
  14. # Verify the result
  15. view_matrix(res_fps$projection[1:100, 1:100], legend_title = "ADMM-FPS\nProjection\nMatrix")

ADMM-FPS projection matrix

  1. # Gradient FPS
  2. res_grad = gradfps_prox_benchmark(
  3. Smat, Pi, d, lambda, x0, lr = 0.02, maxiter = 60,
  4. control = list(fan_maxinc = 10, verbose = 0)
  5. )
  6. # Verify the result
  7. view_matrix(res_grad$projection[1:100, 1:100], legend_title = "GradFPS\nProjection\nMatrix")

GradFPS projection matrix

  1. # Compare computational efficiency
  2. gdat = data.frame(
  3. method = c(rep("ADMM-FPS", length(res_fps$times)), rep("GradFPS", length(res_grad$times))),
  4. time = c(cumsum(res_fps$times), cumsum(res_grad$times)),
  5. err = c(res_fps$errors, res_grad$err_v),
  6. par = sprintf("n = %d, p = %d", n, p)
  7. )
  8. ggplot(gdat, aes(x = time, y = err)) +
  9. geom_line(aes(group = method, color = method, linetype = method), size = 1) +
  10. scale_linetype_manual("Method", values = c("32", "solid")) +
  11. scale_color_hue("Method") +
  12. guides(color = guide_legend(keyheight = 2, keywidth = 3),
  13. linetype = guide_legend(keyheight = 2, keywidth = 3)) +
  14. xlab("Elapsed Time (s)") + ylab("Estimation Error") +
  15. ggtitle(sprintf("n = %d, p = %d", n, p)) +
  16. theme_bw(base_size = 20) +
  17. theme(plot.title = element_text(hjust = 0.5))

Time comparison

Citation

Please consider to cite our work if you have used our algorithm or
package in your research. The code to reproduce the experiments in
the article can be found in the inst/article directory.

  1. @article{qiu2022gradient,
  2. title={Gradient-based Sparse Principal Component Analysis with Extensions to Online Learning},
  3. author={Qiu, Yixuan and Lei, Jing and Roeder, Kathryn},
  4. journal={Biometrika},
  5. year={2022},
  6. month={07},
  7. issn = {1464-3510},
  8. doi = {10.1093/biomet/asac041}
  9. }