Title: | Fast Truncated Singular Value Decomposition and Principal Components Analysis for Large Dense and Sparse Matrices |
---|---|
Description: | Fast and memory efficient methods for truncated singular value decomposition and principal components analysis of large sparse and dense matrices. |
Authors: | Jim Baglama [aut, cph], Lothar Reichel [aut, cph], B. W. Lewis [aut, cre, cph] |
Maintainer: | B. W. Lewis <[email protected]> |
License: | GPL-3 |
Version: | 2.3.5.1 |
Built: | 2024-11-11 03:43:42 UTC |
Source: | https://github.com/bwlewis/irlba |
The augmented implicitly restarted Lanczos bidiagonalization algorithm (IRLBA) finds a few approximate largest (or, optionally, smallest) singular values and corresponding singular vectors of a sparse or dense matrix using a method of Baglama and Reichel. It is a fast and memory-efficient way to compute a partial SVD.
irlba(A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth = TRUE, tol = 1e-05, v = NULL, right_only = FALSE, verbose = FALSE, scale = NULL, center = NULL, shift = NULL, mult = NULL, fastpath = TRUE, svtol = tol, smallest = FALSE, ...)
irlba(A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth = TRUE, tol = 1e-05, v = NULL, right_only = FALSE, verbose = FALSE, scale = NULL, center = NULL, shift = NULL, mult = NULL, fastpath = TRUE, svtol = tol, smallest = FALSE, ...)
A |
numeric real- or complex-valued matrix or real-valued sparse matrix. |
nv |
number of right singular vectors to estimate. |
nu |
number of left singular vectors to estimate (defaults to |
maxit |
maximum number of iterations. |
work |
working subspace dimension, larger values can speed convergence at the cost of more memory use. |
reorth |
if |
tol |
convergence is determined when |
v |
optional starting vector or output from a previous run of |
right_only |
logical value indicating return only the right singular vectors
( |
verbose |
logical value that when |
scale |
optional column scaling vector whose values divide each column of |
center |
optional column centering vector whose values are subtracted from each
column of |
shift |
optional shift value (square matrices only, see notes). |
mult |
DEPRECATED optional custom matrix multiplication function (default is |
fastpath |
try a fast C algorithm implementation if possible; set |
svtol |
additional stopping tolerance on maximum allowed absolute relative change across each
estimated singular value between iterations.
The default value of this parameter is to set it to |
smallest |
set |
... |
optional additional arguments used to support experimental and deprecated features. |
Returns a list with entries:
max(nu, nv) approximate singular values
nu approximate left singular vectors (only when right_only=FALSE)
nv approximate right singular vectors
The number of Lanczos iterations carried out
The total number of matrix vector products carried out
The syntax of irlba
partially follows svd
, with an important
exception. The usual R svd
function always returns a complete set of
singular values, even if the number of singular vectors nu
or nv
is set less than the maximum. The irlba
function returns a number of
estimated singular values equal to the maximum of the number of specified
singular vectors nu
and nv
.
Use the optional scale
parameter to implicitly scale each column of
the matrix A
by the values in the scale
vector, computing the
truncated SVD of the column-scaled sweep(A, 2, scale, FUN=`/`)
, or
equivalently, A %*% diag(1 / scale)
, without explicitly forming the
scaled matrix. scale
must be a non-zero vector of length equal
to the number of columns of A
.
Use the optional center
parameter to implicitly subtract the values
in the center
vector from each column of A
, computing the
truncated SVD of sweep(A, 2, center, FUN=`-`)
,
without explicitly forming the centered matrix. center
must be a vector of length equal to the number of columns of A
.
This option may be used to efficiently compute principal components without
explicitly forming the centered matrix (which can, importantly, preserve
sparsity in the matrix). See the examples.
The optional shift
scalar valued argument applies only to square matrices; use it
to estimate the partial svd of A + diag(shift, nrow(A), nrow(A))
(without explicitly forming the shifted matrix).
(Deprecated) Specify an optional alternative matrix multiplication operator in the
mult
parameter. mult
must be a function of two arguments,
and must handle both cases where one argument is a vector and the other
a matrix. This option is deprecated and will be removed in a future version.
The new preferred method simply uses R itself to define a custom matrix class
with your user-defined matrix multiplication operator. See the examples.
Use the v
option to supply a starting vector for the iterative
method. A random vector is used by default (precede with set.seed()
for reproducibility). Optionally set v
to
the output of a previous run of irlba
to restart the method, adding
additional singular values/vectors without recomputing the solution
subspace. See the examples.
The function may generate the following warnings:
"did not converge–results might be invalid!; try increasing work or maxit"
means that the algorithm didn't
converge – this is potentially a serious problem and the returned results may not be valid. irlba
reports a warning here instead of an error so that you can inspect whatever is returned. If this
happens, carefully heed the warning and inspect the result. You may also try setting fastpath=FALSE
.
"You're computing a large percentage of total singular values, standard svd might work better!"
irlba
is designed to efficiently compute a few of the largest singular values and associated
singular vectors of a matrix. The standard svd
function will be more efficient for computing
large numbers of singular values than irlba
.
"convergence criterion below machine epsilon" means that the product of tol
and the
largest estimated singular value is really small and the normal convergence criterion is only
met up to round off error.
The function might return an error for several reasons including a situation when the starting
vector v
is near the null space of the matrix. In that case, try a different v
.
The fastpath=TRUE
option only supports real-valued matrices and sparse matrices
of type dgCMatrix
(for now). Other problems fall back to the reference
R implementation.
Baglama, James, and Lothar Reichel. "Augmented implicitly restarted Lanczos bidiagonalization methods." SIAM Journal on Scientific Computing 27.1 (2005): 19-42.
svd
, prcomp
, partial_eigen
, svdr
set.seed(1) A <- matrix(runif(400), nrow=20) S <- irlba(A, 3) S$d # Compare with svd svd(A)$d[1:3] # Restart the algorithm to compute more singular values # (starting with an existing solution S) S1 <- irlba(A, 5, v=S) # Estimate smallest singular values irlba(A, 3, smallest=TRUE)$d #Compare with tail(svd(A)$d, 3) # Principal components (see also prcomp_irlba) P <- irlba(A, nv=1, center=colMeans(A)) # Compare with prcomp and prcomp_irlba (might vary up to sign) cbind(P$v, prcomp(A)$rotation[, 1], prcomp_irlba(A)$rotation[, 1]) # A custom matrix multiplication function that scales the columns of A # (cf the scale option). This function scales the columns of A to unit norm. col_scale <- sqrt(apply(A, 2, crossprod)) setClass("scaled_matrix", contains="matrix", slots=c(scale="numeric")) setMethod("%*%", signature(x="scaled_matrix", y="numeric"), function(x ,y) [email protected] %*% (y / x@scale)) setMethod("%*%", signature(x="numeric", y="scaled_matrix"), function(x ,y) (x %*% [email protected]) / y@scale) a <- new("scaled_matrix", A, scale=col_scale) irlba(a, 3)$d # Compare with: svd(sweep(A, 2, col_scale, FUN=`/`))$d[1:3]
set.seed(1) A <- matrix(runif(400), nrow=20) S <- irlba(A, 3) S$d # Compare with svd svd(A)$d[1:3] # Restart the algorithm to compute more singular values # (starting with an existing solution S) S1 <- irlba(A, 5, v=S) # Estimate smallest singular values irlba(A, 3, smallest=TRUE)$d #Compare with tail(svd(A)$d, 3) # Principal components (see also prcomp_irlba) P <- irlba(A, nv=1, center=colMeans(A)) # Compare with prcomp and prcomp_irlba (might vary up to sign) cbind(P$v, prcomp(A)$rotation[, 1], prcomp_irlba(A)$rotation[, 1]) # A custom matrix multiplication function that scales the columns of A # (cf the scale option). This function scales the columns of A to unit norm. col_scale <- sqrt(apply(A, 2, crossprod)) setClass("scaled_matrix", contains="matrix", slots=c(scale="numeric")) setMethod("%*%", signature(x="scaled_matrix", y="numeric"), function(x ,y) x@.Data %*% (y / x@scale)) setMethod("%*%", signature(x="numeric", y="scaled_matrix"), function(x ,y) (x %*% y@.Data) / y@scale) a <- new("scaled_matrix", A, scale=col_scale) irlba(a, 3)$d # Compare with: svd(sweep(A, 2, col_scale, FUN=`/`))$d[1:3]
Use partial_eigen
to estimate a subset of the largest (most positive)
eigenvalues and corresponding eigenvectors of a symmetric dense or sparse
real-valued matrix.
partial_eigen(x, n = 5, symmetric = TRUE, ...)
partial_eigen(x, n = 5, symmetric = TRUE, ...)
x |
numeric real-valued dense or sparse matrix. |
n |
number of largest eigenvalues and corresponding eigenvectors to compute. |
symmetric |
|
... |
optional additional parameters passed to the |
Returns a list with entries:
values n approximate largest eigenvalues
vectors n approximate corresponding eigenvectors
Specify symmetric=FALSE
to compute the largest n
eigenvalues
and corresponding eigenvectors of the symmetric matrix cross-product
t(x) %*% x
.
This function uses the irlba
function under the hood. See ?irlba
for description of additional options, especially the tol
parameter.
See the RSpectra package https://cran.r-project.org/package=RSpectra for more comprehensive partial eigenvalue decomposition.
Augmented Implicitly Restarted Lanczos Bidiagonalization Methods, J. Baglama and L. Reichel, SIAM J. Sci. Comput. 2005.
set.seed(1) # Construct a symmetric matrix with some positive and negative eigenvalues: V <- qr.Q(qr(matrix(runif(100), nrow=10))) x <- V %*% diag(c(10, -9, 8, -7, 6, -5, 4, -3, 2, -1)) %*% t(V) partial_eigen(x, 3)$values # Compare with eigen eigen(x)$values[1:3] # Use symmetric=FALSE to compute the eigenvalues of t(x) %*% x for general # matrices x: x <- matrix(rnorm(100), 10) partial_eigen(x, 3, symmetric=FALSE)$values eigen(crossprod(x))$values
set.seed(1) # Construct a symmetric matrix with some positive and negative eigenvalues: V <- qr.Q(qr(matrix(runif(100), nrow=10))) x <- V %*% diag(c(10, -9, 8, -7, 6, -5, 4, -3, 2, -1)) %*% t(V) partial_eigen(x, 3)$values # Compare with eigen eigen(x)$values[1:3] # Use symmetric=FALSE to compute the eigenvalues of t(x) %*% x for general # matrices x: x <- matrix(rnorm(100), 10) partial_eigen(x, 3, symmetric=FALSE)$values eigen(crossprod(x))$values
Efficient computation of a truncated principal components analysis of a given data matrix
using an implicitly restarted Lanczos method from the irlba
package.
prcomp_irlba(x, n = 3, retx = TRUE, center = TRUE, scale. = FALSE, ...)
prcomp_irlba(x, n = 3, retx = TRUE, center = TRUE, scale. = FALSE, ...)
x |
a numeric or complex matrix (or data frame) which provides the data for the principal components analysis. |
n |
integer number of principal component vectors to return, must be less than
|
retx |
a logical value indicating whether the rotated variables should be returned. |
center |
a logical value indicating whether the variables should be
shifted to be zero centered. Alternately, a centering vector of length
equal the number of columns of |
scale. |
a logical value indicating whether the variables should be
scaled to have unit variance before the analysis takes place.
The default is The value of |
... |
additional arguments passed to |
A list with class "prcomp" containing the following components:
sdev the standard deviations of the principal components (i.e., the square roots of the eigenvalues of the covariance/correlation matrix, though the calculation is actually done with the singular values of the data matrix).
rotation the matrix of variable loadings (i.e., a matrix whose columns contain the eigenvectors).
x if retx
is TRUE
the value of the rotated data (the centred
(and scaled if requested) data multiplied by the rotation
matrix) is returned. Hence, cov(x)
is the diagonal matrix
diag(sdev^2)
.
center, scale the centering and scaling used, or FALSE
.
The signs of the columns of the rotation matrix are arbitrary, and so may differ between different programs for PCA, and even between different builds of R.
NOTE DIFFERENCES WITH THE DEFAULT prcomp
FUNCTION!
The tol
truncation argument found in prcomp
is not supported.
In place of the truncation tolerance in the original function, the
prcomp_irlba
function has the argument n
explicitly giving the
number of principal components to return. A warning is generated if the
argument tol
is used, which is interpreted differently between
the two functions.
set.seed(1) x <- matrix(rnorm(200), nrow=20) p1 <- prcomp_irlba(x, n=3) summary(p1) # Compare with p2 <- prcomp(x, tol=0.7) summary(p2)
set.seed(1) x <- matrix(rnorm(200), nrow=20) p1 <- prcomp_irlba(x, n=3) summary(p1) # Compare with p2 <- prcomp(x, tol=0.7) summary(p2)
Estimate an -penalized
singular value or principal components decomposition (SVD or PCA) that introduces sparsity in the
right singular vectors based on the fast and memory-efficient
sPCA-rSVD algorithm of Haipeng Shen and Jianhua Huang.
ssvd(x, k = 1, n = 2, maxit = 500, tol = 0.001, center = FALSE, scale. = FALSE, alpha = 0, tsvd = NULL, ...)
ssvd(x, k = 1, n = 2, maxit = 500, tol = 0.001, center = FALSE, scale. = FALSE, alpha = 0, tsvd = NULL, ...)
x |
A numeric real- or complex-valued matrix or real-valued sparse matrix. |
k |
Matrix rank of the computed decomposition (see the Details section below). |
n |
Number of nonzero components in the right singular vectors. If |
maxit |
Maximum number of soft-thresholding iterations. |
tol |
Convergence is determined when |
center |
a logical value indicating whether the variables should be
shifted to be zero centered. Alternately, a centering vector of length
equal the number of columns of |
scale. |
a logical value indicating whether the variables should be
scaled to have unit variance before the analysis takes place.
Alternatively, a vector of length equal the number of columns of The value of |
alpha |
Optional scalar regularization parameter between zero and one (see Details below). |
tsvd |
Optional initial rank-k truncated SVD or PCA (skips computation if supplied). |
... |
Additional arguments passed to |
The ssvd
function implements a version of an algorithm by
Shen and Huang that computes a penalized SVD or PCA that introduces
sparsity in the right singular vectors by solving a penalized least squares problem.
The algorithm in the rank 1 case finds vectors that minimize
such that ,
and then sets
and
;
see the referenced paper for details. The penalty
is
implicitly determined from the specified desired number of nonzero values
n
.
Higher rank output is determined similarly
but using a sequence of values determined to maintain the desired number
of nonzero elements in each column of
v
specified by n
.
Unlike standard SVD or PCA, the columns of the returned v
when k > 1
may not be orthogonal.
A list containing the following components:
u regularized left singular vectors with orthonormal columns
d regularized upper-triangluar projection matrix so that x %*% v == u %*% d
v regularized, sparse right singular vectors with columns of unit norm
center, scale the centering and scaling used, if any
lambda the per-column regularization parameter found to obtain the desired sparsity
iter number of soft thresholding iterations
n value of input parameter n
alpha value of input parameter alpha
Our ssvd
implementation of the Shen-Huang method makes the following choices:
The l1 penalty is the only available penalty function. Other penalties may appear in the future.
Given a desired number of nonzero elements in v
, value(s) for the
penalty are determined to achieve the sparsity goal subject to the parameter
alpha
.
An experimental block implementation is used for results with rank greater than 1 (when k > 1
)
instead of the deflation method described in the reference.
The choice of a penalty lambda associated with a given number of desired nonzero
components is not unique. The alpha
parameter, a scalar between zero and one,
selects any possible value of lambda that produces the desired number of
nonzero entries. The default alpha = 0
selects a penalized solution with
largest corresponding value of d
in the 1-d case. Think of alpha
as
fine-tuning of the penalty.
Our method returns an upper-triangular matrix d
when k > 1
so
that x %*% v == u %*% d
. Non-zero
elements above the diagonal result from non-orthogonality of the v
matrix,
providing a simple interpretation of cumulative information, or explained variance
in the PCA case, via the singular value decomposition of d %*% t(v)
.
What if you have no idea for values of the argument n
(the desired sparsity)?
The reference describes a cross-validation and an ad-hoc approach; neither of which are
in the package yet. Both are prohibitively computationally expensive for matrices with a huge
number of columns. A future version of this package will include a revised approach to
automatically selecting a reasonable sparsity constraint.
Compare with the similar but more general functions SPC
and PMD
in the PMA
package
by Daniela M. Witten, Robert Tibshirani, Sam Gross, and Balasubramanian Narasimhan.
The PMD
function can compute low-rank regularized matrix decompositions with sparsity penalties
on both the u
and v
vectors. The ssvd
function is
similar to the PMD(*, L1) method invocation of PMD
or alternatively the SPC
function.
Although less general than PMD
(*),
the ssvd
function can be faster and more memory efficient for the
basic sparse PCA problem.
See https://bwlewis.github.io/irlba/ssvd.html for more information.
(* Note that the s4vd package by Martin Sill and Sebastian Kaiser, https://cran.r-project.org/package=s4vd,
includes a fast optimized version of a closely related algorithm by Shen, Huang, and Marron, that penalizes
both u
and v
.)
Shen, Haipeng, and Jianhua Z. Huang. "Sparse principal component analysis via regularized low rank matrix approximation." Journal of multivariate analysis 99.6 (2008): 1015-1034.
Witten, Tibshirani and Hastie (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. _Biostatistics_ 10(3): 515-534.
set.seed(1) u <- matrix(rnorm(200), ncol=1) v <- matrix(c(runif(50, min=0.1), rep(0,250)), ncol=1) u <- u / drop(sqrt(crossprod(u))) v <- v / drop(sqrt(crossprod(v))) x <- u %*% t(v) + 0.001 * matrix(rnorm(200*300), ncol=300) s <- ssvd(x, n=50) table(actual=v[, 1] != 0, estimated=s$v[, 1] != 0) oldpar <- par(mfrow=c(2, 1)) plot(u, cex=2, main="u (black circles), Estimated u (blue discs)") points(s$u, pch=19, col=4) plot(v, cex=2, main="v (black circles), Estimated v (blue discs)") points(s$v, pch=19, col=4) # Let's consider a trivial rank-2 example (k=2) with noise. Like the # last example, we know the exact number of nonzero elements in each # solution vector of the noise-free matrix. Note the application of # different sparsity constraints on each column of the estimated v. # Also, the decomposition is unique only up to sign, which we adjust # for below. set.seed(1) u <- qr.Q(qr(matrix(rnorm(400), ncol=2))) v <- matrix(0, ncol=2, nrow=300) v[sample(300, 15), 1] <- runif(15, min=0.1) v[sample(300, 50), 2] <- runif(50, min=0.1) v <- qr.Q(qr(v)) x <- u %*% (c(2, 1) * t(v)) + .001 * matrix(rnorm(200 * 300), 200) s <- ssvd(x, k=2, n=colSums(v != 0)) # Compare actual and estimated vectors (adjusting for sign): s$u <- sign(u) * abs(s$u) s$v <- sign(v) * abs(s$v) table(actual=v[, 1] != 0, estimated=s$v[, 1] != 0) table(actual=v[, 2] != 0, estimated=s$v[, 2] != 0) plot(v[, 1], cex=2, main="True v1 (black circles), Estimated v1 (blue discs)") points(s$v[, 1], pch=19, col=4) plot(v[, 2], cex=2, main="True v2 (black circles), Estimated v2 (blue discs)") points(s$v[, 2], pch=19, col=4) par(oldpar)
set.seed(1) u <- matrix(rnorm(200), ncol=1) v <- matrix(c(runif(50, min=0.1), rep(0,250)), ncol=1) u <- u / drop(sqrt(crossprod(u))) v <- v / drop(sqrt(crossprod(v))) x <- u %*% t(v) + 0.001 * matrix(rnorm(200*300), ncol=300) s <- ssvd(x, n=50) table(actual=v[, 1] != 0, estimated=s$v[, 1] != 0) oldpar <- par(mfrow=c(2, 1)) plot(u, cex=2, main="u (black circles), Estimated u (blue discs)") points(s$u, pch=19, col=4) plot(v, cex=2, main="v (black circles), Estimated v (blue discs)") points(s$v, pch=19, col=4) # Let's consider a trivial rank-2 example (k=2) with noise. Like the # last example, we know the exact number of nonzero elements in each # solution vector of the noise-free matrix. Note the application of # different sparsity constraints on each column of the estimated v. # Also, the decomposition is unique only up to sign, which we adjust # for below. set.seed(1) u <- qr.Q(qr(matrix(rnorm(400), ncol=2))) v <- matrix(0, ncol=2, nrow=300) v[sample(300, 15), 1] <- runif(15, min=0.1) v[sample(300, 50), 2] <- runif(50, min=0.1) v <- qr.Q(qr(v)) x <- u %*% (c(2, 1) * t(v)) + .001 * matrix(rnorm(200 * 300), 200) s <- ssvd(x, k=2, n=colSums(v != 0)) # Compare actual and estimated vectors (adjusting for sign): s$u <- sign(u) * abs(s$u) s$v <- sign(v) * abs(s$v) table(actual=v[, 1] != 0, estimated=s$v[, 1] != 0) table(actual=v[, 2] != 0, estimated=s$v[, 2] != 0) plot(v[, 1], cex=2, main="True v1 (black circles), Estimated v1 (blue discs)") points(s$v[, 1], pch=19, col=4) plot(v[, 2], cex=2, main="True v2 (black circles), Estimated v2 (blue discs)") points(s$v[, 2], pch=19, col=4) par(oldpar)
prcomp_irlba
.Summary method for truncated pca objects computed by prcomp_irlba
.
## S3 method for class 'irlba_prcomp' summary(object, ...)
## S3 method for class 'irlba_prcomp' summary(object, ...)
object |
An object returned by |
... |
Optional arguments passed to |
The randomized method for truncated SVD by P. G. Martinsson and colleagues
finds a few approximate largest singular values and corresponding
singular vectors of a sparse or dense matrix. It is a fast and
memory-efficient way to compute a partial SVD, similar in performance
for many problems to irlba
. The svdr
method
is a block method and may produce more accurate estimations with
less work for problems with clustered large singular values (see
the examples). In other problems, irlba
may exhibit faster
convergence.
svdr(x, k, tol = 1e-05, it = 100L, extra = min(10L, dim(x) - k), center = NULL, Q = NULL, return.Q = FALSE)
svdr(x, k, tol = 1e-05, it = 100L, extra = min(10L, dim(x) - k), center = NULL, Q = NULL, return.Q = FALSE)
x |
numeric real- or complex-valued matrix or real-valued sparse matrix. |
k |
dimension of subspace to estimate (number of approximate singular values to compute). |
tol |
stop iteration when the largest absolute relative change in estimated singular values from one iteration to the next falls below this value. |
it |
maximum number of algorithm iterations. |
extra |
number of extra vectors of dimension |
center |
optional column centering vector whose values are implicitly subtracted from each
column of |
Q |
optional initial random matrix, defaults to a matrix of size |
return.Q |
if |
Also see an alternate implementation (rsvd
) of this method by N. Benjamin Erichson
in the https://cran.r-project.org/package=rsvd package.
Returns a list with entries:
k approximate singular values
k approximate left singular vectors
k approximate right singular vectors
total number of matrix products carried out
optional subspace matrix (when return.Q=TRUE
)
Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions N. Halko, P. G. Martinsson, J. Tropp. Sep. 2009.
irlba
, svd
, rsvd
in the rsvd package
set.seed(1) A <- matrix(runif(400), nrow=20) svdr(A, 3)$d # Compare with svd svd(A)$d[1:3] # Compare with irlba irlba(A, 3)$d ## Not run: # A problem with clustered large singular values where svdr out-performs irlba. tprolate <- function(n, w=0.25) { a <- rep(0, n) a[1] <- 2 * w a[2:n] <- sin( 2 * pi * w * (1:(n-1)) ) / ( pi * (1:(n-1)) ) toeplitz(a) } x <- tprolate(512) set.seed(1) tL <- system.time(L <- irlba(x, 20)) tR <- system.time(R <- svdr(x, 20)) S <- svd(x) plot(S$d) data.frame(time=c(tL[3], tR[3]), error=sqrt(c(crossprod(L$d - S$d[1:20]), crossprod(R$d - S$d[1:20]))), row.names=c("IRLBA", "Randomized SVD")) # But, here is a similar problem with clustered singular values where svdr # doesn't out-perform irlba as easily...clusters of singular values are, # in general, very hard to deal with! # (This example based on https://github.com/bwlewis/irlba/issues/16.) set.seed(1) s <- svd(matrix(rnorm(200 * 200), 200)) x <- s$u %*% (c(exp(-(1:100)^0.3) * 1e-12 + 1, rep(0.5, 100)) * t(s$v)) tL <- system.time(L <- irlba(x, 5)) tR <- system.time(R <- svdr(x, 5)) S <- svd(x) plot(S$d) data.frame(time=c(tL[3], tR[3]), error=sqrt(c(crossprod(L$d - S$d[1:5]), crossprod(R$d - S$d[1:5]))), row.names=c("IRLBA", "Randomized SVD")) ## End(Not run)
set.seed(1) A <- matrix(runif(400), nrow=20) svdr(A, 3)$d # Compare with svd svd(A)$d[1:3] # Compare with irlba irlba(A, 3)$d ## Not run: # A problem with clustered large singular values where svdr out-performs irlba. tprolate <- function(n, w=0.25) { a <- rep(0, n) a[1] <- 2 * w a[2:n] <- sin( 2 * pi * w * (1:(n-1)) ) / ( pi * (1:(n-1)) ) toeplitz(a) } x <- tprolate(512) set.seed(1) tL <- system.time(L <- irlba(x, 20)) tR <- system.time(R <- svdr(x, 20)) S <- svd(x) plot(S$d) data.frame(time=c(tL[3], tR[3]), error=sqrt(c(crossprod(L$d - S$d[1:20]), crossprod(R$d - S$d[1:20]))), row.names=c("IRLBA", "Randomized SVD")) # But, here is a similar problem with clustered singular values where svdr # doesn't out-perform irlba as easily...clusters of singular values are, # in general, very hard to deal with! # (This example based on https://github.com/bwlewis/irlba/issues/16.) set.seed(1) s <- svd(matrix(rnorm(200 * 200), 200)) x <- s$u %*% (c(exp(-(1:100)^0.3) * 1e-12 + 1, rep(0.5, 100)) * t(s$v)) tL <- system.time(L <- irlba(x, 5)) tR <- system.time(R <- svdr(x, 5)) S <- svd(x) plot(S$d) data.frame(time=c(tL[3], tR[3]), error=sqrt(c(crossprod(L$d - S$d[1:5]), crossprod(R$d - S$d[1:5]))), row.names=c("IRLBA", "Randomized SVD")) ## End(Not run)