Type: | Package |
Title: | Parallel Low-Rank Approximation with Nonnegativity Constraints |
Version: | 2.0.13 |
Date: | 2025-07-14 |
Description: | 'Rcpp' bindings for 'PLANC', a highly parallel and extensible NMF/NTF (Non-negative Matrix/Tensor Factorization) library. Wraps algorithms described in Kannan et. al (2018) <doi:10.1109/TKDE.2017.2767592> and Eswar et. al (2021) <doi:10.1145/3432185>. Implements algorithms described in Welch et al. (2019) <doi:10.1016/j.cell.2019.05.006>, Gao et al. (2021) <doi:10.1038/s41587-021-00867-x>, and Kriebel & Welch (2022) <doi:10.1038/s41467-022-28431-4>. |
Encoding: | UTF-8 |
Imports: | methods, Rcpp, Matrix, hdf5r.Extra |
Suggests: | knitr, withr, rmarkdown, testthat (≥ 3.0.0) |
LinkingTo: | Rcpp, RcppArmadillo, RcppProgress, HighFive |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
URL: | https://github.com/welch-lab/RcppPlanc/ |
BugReports: | https://github.com/welch-lab/RcppPlanc/issues/ |
RoxygenNote: | 7.3.2 |
SystemRequirements: | C++17, cmake >= 3.21.0, hdf5, git, patch, gnumake, hwloc, GNU make |
Config/testthat/edition: | 3 |
Depends: | R (≥ 3.5) |
LazyData: | true |
VignetteBuilder: | knitr |
NeedsCompilation: | yes |
Packaged: | 2025-07-14 19:15:08 UTC; andrew |
Author: | Andrew Robbins |
Maintainer: | Andrew Robbins <robbiand@umich.edu> |
Repository: | CRAN |
Date/Publication: | 2025-07-14 22:40:06 UTC |
Argument list object for using a dense matrix stored in HDF5 file
Description
For running inmf
, onlineINMF
or
uinmf
with dense matrix stored in HDF5 file, users will need
to construct an argument list for the filename of the HDF5 file as well as
the path in the file storing the matrix. H5Mat
is provided as an
instructed constructor. Meanwhile, since the INMF functions require that
all datasets should be of the same type, as.H5Mat
is provided for
writing in-memory data into a new HDF5 file on disk and returning the
constructed argument list.
Usage
H5Mat(filename, dataPath)
as.H5Mat(x, filename, dataPath = "data", overwrite = FALSE, ...)
## S3 method for class 'matrix'
as.H5Mat(x, filename, dataPath = "data", overwrite, ...)
## S3 method for class 'dgCMatrix'
as.H5Mat(x, filename, dataPath = "data", overwrite = FALSE, ...)
## Default S3 method:
as.H5Mat(x, filename, dataPath = "data", ...)
Arguments
filename |
Filename of the HDF5 file |
dataPath |
Path in the HDF5 file that points to a 2D dense matrix.
Default |
x |
For |
overwrite |
Logical, whether to overwrite the file if already exists at
the given path. Default |
... |
Passed down to hdf5r.Extra::h5Write |
Value
H5Mat object, indeed a list object.
Examples
if (require("withr")) {
H5MatEx <- function(){
withr::local_dir(withr::local_tempdir())
h <- H5Mat(system.file("extdata/ctrl_dense.h5", package = "RcppPlanc"),
"data")
print(h)
library(Matrix)
ctrl.dense <- as.matrix(ctrl.sparse)
h1 <- as.H5Mat(ctrl.dense, "ctrl_from_dense_to_dense.h5",
dataPath = "data")
h1
h2 <- as.H5Mat(ctrl.sparse, "ctrl_from_sparse_to_dense.h5",
dataPath = "data")
}
H5MatEx()
}
Argument list object for using a sparse matrix stored in HDF5 file
Description
For running inmf
, onlineINMF
or
uinmf
with sparse matrix stored in HDF5 file, users will need
to construct an argument list for the filename of the HDF5 file as well as
the paths in the file storing the arrays that construct the CSC (compressed
sparse column) matrix. H5SpMat
is provided as an instructed
constructor. Meanwhile, since the INMF functions require that all datasets
should be of the same type, as.H5SpMat
is provided for writing
in-memory data into a new HDF5 file on disk and returning the
constructed argument list.
Usage
H5SpMat(filename, valuePath, rowindPath, colptrPath, nrow, ncol)
as.H5SpMat(x, filename, dataPath, overwrite = FALSE)
## S3 method for class 'matrix'
as.H5SpMat(x, filename, dataPath = "", overwrite = FALSE)
## S3 method for class 'dgCMatrix'
as.H5SpMat(x, filename, dataPath = "", overwrite = FALSE)
## Default S3 method:
as.H5SpMat(x, filename, dataPath = "", overwrite = FALSE, ...)
Arguments
filename |
Filename of the HDF5 file |
valuePath |
Path in the HDF5 file that points to a 1D array storing the
non-zero values of the sparse matrix. Default |
rowindPath |
Path in the HDF5 file that points to a 1D integer array
storing the row indices of non-zero values in each column of the sparse
matrix. Default |
colptrPath |
Path in the HDF5 file that points to a 1D integer array
storing the number of non-zero values in each column of the sparse matrix.
Default |
nrow , ncol |
Integer, the true dimensionality of the sparse matrix. |
x |
For |
dataPath |
For |
overwrite |
Logical, whether to overwrite the file if already exists at
the given path. Default |
... |
not used |
Value
H5SpMat object, indeed a list object.
Examples
if (require("withr")) {
H5SpMatEx <- function() {
withr::local_dir(withr::local_tempdir())
h <- H5SpMat(system.file("extdata/ctrl_sparse.h5", package = "RcppPlanc"),
"scaleDataSparse/data", "scaleDataSparse/indices", "scaleDataSparse/indptr", 173, 300)
dim(h)
library(Matrix)
ctrl.dense <- as.matrix(ctrl.sparse)
h1 <- as.H5SpMat(ctrl.sparse, "ctrl_from_sparse_to_sparse.h5", "matrix")
h1
h2 <- as.H5SpMat(ctrl.dense, "ctrl_from_dense_to_sparse.h5", "matrix")
h2
}
H5SpMatEx()
}
Block Principal Pivoted Non-Negative Least Squares
Description
Use the BPP algorithm to get the nonnegative least squares solution. Regular
NNLS problem is described as optimizing \min_{x\ge0}||CX - B||_F^2
where C
and B
are given and X
is to be solved.
bppnnls
takes C
and B
as input. bppnnls_prod
takes
C^\mathsf{T}C
and C^\mathsf{T}B
as
input to directly go for the intermediate step of BPP algorithm. This can be
useful when the dimensionality of C
and B
is large while
pre-calculating C^\mathsf{T}C
and C^\mathsf{T}B
is cheap.
Usage
bppnnls(C, B, nCores = 2L)
bppnnls_prod(CtC, CtB, nCores = 2L)
Arguments
C |
Input dense |
B |
Input |
nCores |
The number of parallel tasks that will be spawned.
Default |
CtC |
The |
CtB |
The |
Value
The calculated solution matrix in dense form.
Examples
set.seed(1)
C <- matrix(rnorm(250), nrow = 25)
B <- matrix(rnorm(375), nrow = 25)
res1 <- bppnnls(C, B)
dim(res1)
res2 <- bppnnls_prod(t(C) %*% C, t(C) %*% B)
all.equal(res1, res2)
Example single-cell transcriptomic data in sparse form
Description
The two datasets, namingly ctrl.sparse
and stim.sparse
, are
single-cell transcriptomic data preprocessed and subsampled from the study
of Hyun Min Kang and et al., Nat Biotech., 2018. The raw datasets were two
sparse matrices of integer values indicating the counts of genes (rows) per
cell (columns). We normalized each column of both matrices by its library
size (sum), and selected common variable genes across the datasets. Finally,
we scaled the genes without centering them, in order to keep the
non-negativity. The processed datasets were then randomly subsampled for a
minimal example.
Usage
ctrl.sparse
stim.sparse
Format
An object of class dgCMatrix
with 173 rows and 300 columns.
An object of class dgCMatrix
with 173 rows and 300 columns.
Source
https://www.nature.com/articles/nbt.4042
Retrieve the dimension of H5SpMat argument list
Description
Retrieve the dimension of H5SpMat argument list
Usage
## S3 method for class 'H5SpMat'
dim(x)
## S3 replacement method for class 'H5SpMat'
dim(x) <- value
Arguments
x |
H5SpMat argument list object |
value |
Numeric vector of two, for number of rows and number of columns. |
Value
Retriever returns a vector of two (nrow and ncol), setter sets the value of that in the argument list.
Examples
h <- H5SpMat(system.file("extdata/ctrl_sparse.h5", package = "RcppPlanc"),
"data", "indices", "indptr", 173, 300)
dim(h)
nrow(h)
ncol(h)
dim(h) <- c(200, 200)
h
Prepare character information of a H5Mat object
Description
Prepare character information of a H5Mat object
Usage
## S3 method for class 'H5Mat'
format(x, ...)
Arguments
x |
H5Mat argument list object |
... |
Not used. |
Value
A character scalar of the displayed message
Examples
h <- H5Mat(system.file("extdata/ctrl_dense.h5", package = "RcppPlanc"),
"data")
format(h)
prepare character information of a H5SpMat object
Description
prepare character information of a H5SpMat object
Usage
## S3 method for class 'H5SpMat'
format(x, ...)
Arguments
x |
H5SpMat argument list object |
... |
Not used. |
Value
A character scalar of the displayed message
Examples
h <- H5SpMat(system.file("extdata/ctrl_sparse.h5", package = "RcppPlanc"),
"data", "indices", "indptr", 173, 300)
format(h)
Perform Integrative Non-negative Matrix Factorization
Description
Performs integrative non-negative matrix factorization (iNMF) (J.D. Welch,
2019) to return factorized H
, W
, and V
matrices. The
objective function is stated as
\arg\min_{H\ge0,W\ge0,V\ge0}\sum_{i}^{d}||E_i-(W+V_i)Hi||^2_F+
\lambda\sum_{i}^{d}||V_iH_i||_F^2
where E_i
is the input non-negative matrix of the i
'th dataset,
d
is the total number of datasets. E_i
is of size
m \times n_i
for m
features and n_i
sample points,
H_i
is of size k \times n_i
, V_i
is of size
m \times k
, and W
is of size m \times k
.
inmf
optimizes the objective with ANLS strategy, while
onlineINMF
optimizes the same objective with an online learning
strategy.
Usage
inmf(
objectList,
k = 20,
lambda = 5,
niter = 30,
nCores = 2,
Hinit = NULL,
Vinit = NULL,
Winit = NULL,
verbose = FALSE
)
Arguments
objectList |
list of input datasets. List elements should all be of the same class. Viable classes include: matrix, dgCMatrix, H5Mat, H5SpMat. |
k |
Integer. Inner dimensionality to factorize the datasets into.
Default |
lambda |
Regularization parameter. Larger values penalize
dataset-specific effects more strongly (i.e. alignment should increase as
|
niter |
Integer. Total number of block coordinate descent iterations to
perform. Default |
nCores |
The number of parallel tasks that will be spawned.
Default |
Hinit |
Initial values to use for |
Vinit |
Similar to |
Winit |
Initial values to use for |
verbose |
Logical scalar. Whether to show information and progress.
Default |
Value
A list of the following elements:
H
- a list of resultH_i
matrices of sizen_i \times k
V
- a list of resultV_i
matricesW
- the resultW
matrixobjErr
- the final objective error value.
Author(s)
Yichen Wang
References
Joshua D. Welch and et al., Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity, Cell, 2019
Examples
library(Matrix)
set.seed(1)
result <- inmf(list(ctrl.sparse, stim.sparse), k = 10, niter = 10, verbose = FALSE)
Perform Non-negative Matrix Factorization
Description
Regularly, Non-negative Matrix Factorization (NMF) is factorizes input
matrix X
into low rank matrices W
and H
, so that
X \approx WH
. The objective function can be stated as
\arg\min_{W\ge0,H\ge0}||X-WH||_F^2
. In practice, X
is usually
regarded as a matrix of m
features by n
sample points. And the
result matrix W
should have the dimensionality of m \times k
and
H with n \times k
(transposed).
This function wraps the algorithms implemented in PLANC library to solve
NMF problems. Algorithms includes Alternating Non-negative Least Squares
with Block Principal Pivoting (ANLS-BPP), Alternating Direction Method of
Multipliers (ADMM), Hierarchical Alternating Least Squares (HALS), and
Multiplicative Update (MU).
Usage
nmf(
x,
k,
niter = 30L,
algo = "anlsbpp",
nCores = 2L,
Winit = NULL,
Hinit = NULL
)
Arguments
x |
Input matrix for factorization. Can be either dense or sparse. |
k |
Integer. Factor matrix rank. |
niter |
Integer. Maximum number of NMF interations. |
algo |
Algorithm to perform the factorization, choose from "anlsbpp", "admm", "hals" or "mu". See detailed sections. |
nCores |
The number of parallel tasks that will be spawned. Only applies to anlsbpp.
Default |
Winit |
Initial left-hand factor matrix, must be of size m x k. |
Hinit |
Initial right-hand factor matrix, must be of size n x k. |
Value
A list with the following elements:
W
- the result left-hand factor matrixH
- the result right hand matrix.objErr
- the objective error of the factorization.
References
Ramakrishnan Kannan and et al., A High-Performance Parallel Algorithm for Nonnegative Matrix Factorization, PPoPP '16, 2016, 10.1145/2851141.2851152
Perform Integrative Non-negative Matrix Factorization Using Online Learning
Description
Performs integrative non-negative matrix factorization (iNMF) (J.D. Welch,
2019, C. Gao, 2021) using online learning approach to return factorized
H
, W
, and V
matrices. The objective function is stated as
\arg\min_{H\ge0,W\ge0,V\ge0}\sum_{i}^{d}||E_i-(W+V_i)Hi||^2_F+
\lambda\sum_{i}^{d}||V_iH_i||_F^2
where E_i
is the input non-negative matrix of the i
'th dataset,
d
is the total number of datasets. E_i
is of size
m \times n_i
for m
features and n_i
sample points,
H_i
is of size k \times n_i
, V_i
is of size
m \times k
, and W
is of size m \times k
.
Different from inmf
which optimizes the objective with ANLS
approach, onlineINMF
optimizes the same objective with online learning
strategy, where it updates mini-batches of H_i
solving the NNLS
problem, and updates V_i
and W
with HALS multiplicative method.
This function allows online learning in 3 scenarios:
Fully observed datasets;
Iterative refinement using continually arriving datasets;
Projection of new datasets without updating the existing factorization
Usage
onlineINMF(
objectList,
newDatasets = NULL,
project = FALSE,
k = 20,
lambda = 5,
maxEpoch = 5,
minibatchSize = 5000,
maxHALSIter = 1,
permuteChunkSize = 1000,
nCores = 2,
Hinit = NULL,
Vinit = NULL,
Winit = NULL,
Ainit = NULL,
Binit = NULL,
verbose = FALSE
)
Arguments
objectList |
list of input datasets. List elements should all be of the same class. Viable classes include: matrix, dgCMatrix, H5Mat, H5SpMat. |
newDatasets |
Same requirements as for new arriving datasets. Default
|
project |
Logical scalar, whether to run scenario 3. See description.
Default |
k |
Integer. Inner dimensionality to factorize the datasets into.
Default |
lambda |
Regularization parameter. Larger values penalize
dataset-specific effects more strongly (i.e. alignment should increase as
|
maxEpoch |
The number of epochs to iterate through. Default |
minibatchSize |
Total number of cells in each mini-batch. Default
|
maxHALSIter |
Maximum number of block coordinate descent (HALS
algorithm) iterations to perform for each update of |
permuteChunkSize |
Number of cells in a chunk being shuffled before
subsetting to minibatches. Only appliable to in-memory data and for Scenario
1 and 2. Default |
nCores |
The number of parallel tasks that will be spawned.
Default |
Hinit , Vinit , Winit , Ainit , Binit |
Pass the previous factorization result
for datasets existing in |
verbose |
Logical scalar. Whether to show information and progress.
Default |
Value
A list of the following elements:
H
- a list of resultH_i
matrices of sizen_i \times k
V
- a list of resultV_i
matricesW
- the resultW
matrixA
- a list of resultA_i
matrices,k \times k
B
- a list of resultB_i
matrices,m \times k
objErr
- the final objective error value.
Author(s)
Yichen Wang
References
Joshua D. Welch and et al., Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity, Cell, 2019
Chao Gao and et al., Iterative single-cell multi-omic integration using online learning, Nat Biotechnol., 2021
Examples
library(Matrix)
# Scenario 1 with sparse matrices
set.seed(1)
res1 <- onlineINMF(list(ctrl.sparse, stim.sparse),
minibatchSize = 50, k = 10, verbose = FALSE)
# Scenario 2 with H5 dense matrices
h5dense1 <- H5Mat(filename = system.file("extdata", "ctrl_dense.h5",
package = "RcppPlanc", mustWork = TRUE),
dataPath = "scaleData")
h5dense2 <- H5Mat(filename = system.file("extdata", "stim_dense.h5",
package = "RcppPlanc", mustWork = TRUE),
dataPath = "scaleData")
res2 <- onlineINMF(list(ctrl = h5dense1), minibatchSize = 50, k = 10, verbose = FALSE)
res3 <- onlineINMF(list(ctrl = h5dense1),
newDatasets = list(stim = h5dense2),
Hinit = res2$H, Vinit = res2$V, Winit = res2$W,
Ainit = res2$A, Binit = res2$B,
minibatchSize = 50, k = 10, verbose = FALSE)
# Scenario 3 with H5 sparse matrices
h5sparse1 <- H5SpMat(filename = system.file("extdata", "ctrl_sparse.h5",
package = "RcppPlanc", mustWork = TRUE),
valuePath = "scaleDataSparse/data",
rowindPath = "scaleDataSparse/indices",
colptrPath = "scaleDataSparse/indptr",
nrow = nrow(ctrl.sparse),
ncol = ncol(ctrl.sparse))
h5sparse2 <- H5SpMat(filename = system.file("extdata", "stim_sparse.h5",
package = "RcppPlanc", mustWork = TRUE),
valuePath = "scaleDataSparse/data",
rowindPath = "scaleDataSparse/indices",
colptrPath = "scaleDataSparse/indptr",
nrow = nrow(stim.sparse),
ncol = nrow(stim.sparse))
res4 <- onlineINMF(list(ctrl = h5sparse1), minibatchSize = 50, k = 10, verbose = FALSE)
res5 <- onlineINMF(list(ctrl = h5sparse1),
newDatasets = list(stim = h5sparse2), project = TRUE,
Hinit = res4$H, Vinit = res4$V, Winit = res4$W,
Ainit = res4$A, Binit = res4$B,
minibatchSize = 50, k = 10, verbose = FALSE)
Show information of a H5Mat object
Description
Show information of a H5Mat object
Usage
## S3 method for class 'H5Mat'
print(x, ...)
Arguments
x |
H5Mat argument list object |
... |
Not used. |
Value
NULL. Information displayed.
Examples
h <- H5Mat(system.file("extdata/ctrl_dense.h5", package = "RcppPlanc"),
"data")
print(h)
Show information of a H5SpMat object
Description
Show information of a H5SpMat object
Usage
## S3 method for class 'H5SpMat'
print(x, ...)
Arguments
x |
H5SpMat argument list object |
... |
Not used. |
Value
NULL. Information displayed.
Examples
h <- H5SpMat(system.file("extdata/ctrl_sparse.h5", package = "RcppPlanc"),
"data", "indices", "indptr", 173, 300)
print(h)
Perform Symmetric Non-negative Matrix Factorization
Description
Symmetric input matrix X
of size n \times n
is required. Two
approaches are provided. Alternating Non-negative Least Squares Block
Principal Pivoting algorithm (ANLSBPP) with symmetric regularization, where
the objective function is set to be \arg\min_{H\ge0,W\ge0}||X-WH||_F^2+
\lambda||W-H||_F^2
, can be run with algo = "anlsbpp"
.
Gaussian-Newton algorithm, where the objective function is set to be
\arg\min_{H\ge0}||X-H^\mathsf{T}H||_F^2
, can be run with algo =
"gnsym"
. In the objectives, W
is of size n \times k
and H
is of size k \times n
. The returned results will all be
n \times k
.
Usage
symNMF(
x,
k,
niter = 30L,
lambda = 0,
algo = "gnsym",
nCores = 2L,
Hinit = NULL
)
Arguments
x |
Input matrix for factorization. Must be symmetric. Can be either dense or sparse. |
k |
Integer. Factor matrix rank. |
niter |
Integer. Maximum number of symNMF interations.
Default |
lambda |
Symmetric regularization parameter. Must be
non-negative. Default |
algo |
Algorithm to perform the factorization, choose from "gnsym" or
"anlsbpp". Default |
nCores |
The number of parallel tasks that will be spawned. Only applies to anlsbpp.
Default |
Hinit |
Initial right-hand factor matrix, must be of size n x k.
Default |
Value
A list with the following elements:
W
- the result left-hand factor matrix, non-empty when using"anlsbpp"
H
- the result right hand matrix.objErr
- the objective error of the factorization.
References
Srinivas Eswar and et al., Distributed-Memory Parallel Symmetric Nonnegative Matrix Factorization, SC '20, 2020, 10.5555/3433701.3433799
Perform Mosaic Integrative Non-negative Matrix Factorization with Unshared Features
Description
Performs mosaic integrative non-negative matrix factorization (UINMF) (A.R.
Kriebel, 2022) to return factorized H
, W
, V
and U
matrices. The objective function is stated as
\arg\min_{H\ge0,W\ge0,V\ge0,U\ge0}\sum_{i}^{d}
||\begin{bmatrix}E_i \\ P_i \end{bmatrix} -
(\begin{bmatrix}W \\ 0 \end{bmatrix}+
\begin{bmatrix}V_i \\ U_i \end{bmatrix})Hi||^2_F+
\lambda_i\sum_{i}^{d}||\begin{bmatrix}V_i \\ U_i \end{bmatrix}H_i||_F^2
where E_i
is the input non-negative matrix of the i
'th dataset,
P_i
is the input non-negative matrix for the unshared features,
d
is the total number of datasets. E_i
is of size
m \times n_i
for m
shared features and n_i
sample points,
P_i
is of size u_i \times n_i
for u_i
unshared feaetures,
H_i
is of size k \times n_i
, V_i
is of size
m \times k
, W
is of size m \times k
and U_i
is of
size u_i \times k
.
Similar to inmf
, uinmf
also optimizes the objective with
ANLS algorithm.
Usage
uinmf(
objectList,
unsharedList,
k = 20,
lambda = 5,
niter = 30,
nCores = 2,
verbose = FALSE
)
Arguments
objectList |
list of input datasets. List elements should all be of the same class. Viable classes include: matrix, dgCMatrix, H5Mat, H5SpMat. |
unsharedList |
List of input unshared feature matrices, with the same
requirement as |
k |
Integer. Inner dimensionality to factorize the datasets into.
Default |
lambda |
Regularization parameter. Use one number for all datasets or a
vector to specify for each dataset. Larger values penalize dataset-specific
effects more strongly (i.e. alignment should increase as |
niter |
Integer. Total number of block coordinate descent iterations to
perform. Default |
nCores |
The number of parallel tasks that will be spawned.
Default |
verbose |
Logical scalar. Whether to show information and progress.
Default |
Value
A list of the following elements:
H
- a list of resultH_i
matrices of sizen_i \times k
V
- a list of resultV_i
matricesW
- the resultW
matrixU
- a list of resultA_i
matricesobjErr
- the final objective error value.
Author(s)
Yichen Wang
References
April R. Kriebel and Joshua D. Welch, UINMF performs mosaic integration of single-cell multi-omic datasets using nonnegative matrix factorization, Nat. Comm., 2022
Examples
# Fake matrices representing unshared features of the given datasets
# Real-life use should have features that are not presented in the
# intersection of features of all datasets involved.
ctrl.unshared <- ctrl.sparse[1:10,]
stim.unshared <- stim.sparse[11:30,]
set.seed(1)
result <- uinmf(list(ctrl.sparse, stim.sparse),
list(ctrl.unshared, stim.unshared), verbose = FALSE)