Lightweight robust covariance estimation in Julia
Status | Coverage | Docs |
---|---|---|
|
Lightweight robust covariance estimation in Julia i.e. if you have a data matrix X
of size n×p
corresponding to n
observations with p
features, this package will help you to obtain an estimator of the covariance matrix of size p×p
associated with this data.
Note: if you are interested in covariance estimation in the context of a linear regression, consider for now the package CovarianceMatrices.jl which focuses around that case.
using CovarianceEstimation
X = randn(5, 7)
S_uncorrected = cov(SimpleCovariance(), X)
S_corrected = cov(SimpleCovariance(corrected=true), X)
# using linear shrinkage with different targets
LSE = LinearShrinkage
# - Ledoit-Wolf target + shrinkage
method = LSE(ConstantCorrelation())
S_ledoitwolf = cov(method, X)
# - Chen target + shrinkage (using the more verbose call)
method = LSE(target=DiagonalCommonVariance(), shrinkage=:rblw)
S_chen_rblw = cov(method, X)
method = LSE(target=DiagonalCommonVariance(), shrinkage=:oas)
S_chen_oas = cov(method, X)
# a pre-defined shrinkage can be used as well
method = LinearShrinkage(DiagonalUnitVariance(), 0.5)
# using a given shrinkage
S_05 = cov(method, X)
In this section, X
is the data matrix of size n × p
, S
is the sample covariance matrix with S = κ (Xc' * Xc)
where κ
is either n
(uncorrected) or n-1
(corrected) and Xc
is the centred data matrix (see docs).
SimpleCovariance
: basic corrected or uncorrected sample covariance (implemented in StatsBase.jl
).Time complexity: O(p²n)
with a low constant
These methods build an estimator of the covariance derived from S
. They are implemented using abstract covariance estimation interface from StatsBase.jl
.
LinearShrinkage
: James-Stein type estimator of the form (1-λ)S+λF
where F
is a target and λ∈[0,1]
a shrinkage intensity.DiagonalCommonVariance
target, a Rao-Blackwellised Ledoit-Wolf shrinkage (:rblw
) and Oracle-Approximating shrinkage (:oas
) are also supported (see [3]).S
is symmetric semi-positive definite so that if the F
is symmetric positive definite and provided λ
is non-zero, the estimator obtained after shrinkage is also symmetric positive definite. For the diagonal targets DiagonalUnitVariance
, DiagonalCommonVariance
and DiagonalUnequalVariance
the target is necessarily SPD.AnalyticalNonlinearShrinkage
: estimator of the form MΛM'
where M
and Λ
are matrices derived from the eigen decomposition of S
.[4]WoodburyEstimator
: estimator of the form σ²I + U * Λ * U'
, where I
is the identity matrix, U
is low rank semi-orthogonal, and Λ
is diagonal. This form is well-suited to very high-dimensional problems.[6]BiweightMidcovariance
: robust estimator, described more in the documentation.[5]Time complexity:
O(p²n)
with a low constant (main cost is forming S
)p<n
: O(p²n + n²)
with a moderate constant (main cost is forming S
and manipulating a matrix of n²
elements)p>n
: O(p³)
with a low constant (main cost is computing the eigen decomposition of S
).O(p²n)
These are estimators that may be implemented in the future, see also this review paper.
For HAC (and other estimators of covariance of coefficient of regression models) you can currently use the CovarianceMatrices.jl package.
Rough benchmarks are run over random matrices of various sizes (40x20, 20x40, 400x200, 200x400
).
These benchmarks should (as usual) be taken with a pinch of salt but essentially a significant speedup should be expected for a standard problem.
DiagonalCommonVariance
target with oas
and lw
shrinkage)5x
DiagonalUnequalVariance
target with ss
shrinkage)22x
ConstantCorrelation
target with lw
shrinkage, we used Octave for the comparison)12x
AnalyticalNonlinearShrinkage
)25x
BiweightMidcovariance
)3x