Type: Package
Title: Building PRS Models Based on Summary Statistics of GWAs
Version: 1.2.1
Description: Shrinkage estimator for polygenic risk prediction (PRS) models based on summary statistics of genome-wide association (GWA) studies. Based upon the methods and original 'PANPRS' package as found in: Chen, Chatterjee, Landi, and Shi (2020) <doi:10.1080/01621459.2020.1764849>.
License: GPL-3
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.2.3
Depends: gtools, R (≥ 3.1.0)
LinkingTo: Rcpp (≥ 1.0.14), RcppArmadillo (≥ 14.4.3-1)
Imports: Rcpp (≥ 1.0.14)
NeedsCompilation: yes
Packaged: 2025-07-19 17:08:04 UTC; Jared
Author: Katherine Luo [aut, cre], Osvaldo Espin-Garcia [aut], Ting-Huei Chen [aut]
Maintainer: Katherine Luo <hluo224@uwo.ca>
Repository: CRAN
Date/Publication: 2025-07-22 10:20:22 UTC

A vector of sample sizes for the q traits of the summaryZ.

Description

A vector of q sample sizes for the q set of Z statistics corresponding to the q columns of summaryZ.

Usage

data(Nvec)

Format

A vector with q elements, where q is the number of columns of summaryZ.


Inputs for the functional annotations of SNPs.

Description

A 3614 x 3 matrix with (0,1) entry with 3614 SNPs and 3 functional annotations. For the element at i-th row, j-th column, the entry 0 means SNP i without j-th functional annotation; entry 1 means otherwise. follows:

Usage

data(summaryZ)

Format

A matrix with 3614 rows for the 3614 SNPs and 3 columns for functional annotations.


Run the gsPEN algorithm for multiple traits, without functional annotations.

Description

Run the gsPEN algorithm for multiple traits, without functional annotations.

Usage

gsPEN_R(
  summary_z,
  n_vec,
  plinkLD,
  n_iter = 100,
  upper_val = NULL,
  breaking = 1,
  z_scale = 1,
  tuning_matrix = NULL,
  tau_factor = c(1/25, 1, 10),
  len_lim_lambda = 10,
  sub_tuning = 50,
  lim_lambda = c(0.5, 0.9),
  len_lambda = 200,
  df_max = NULL,
  sparse_beta = FALSE,
  debug_output = FALSE,
  verbose = FALSE
)

Arguments

summary_z

A matrix of summary statistics for each SNP and trait.

n_vec

A vector of sample sizes for each of the Q traits corresponding to the Q columns of summary_z.

plinkLD

A matrix of LD values for each pair of SNPs.

n_iter

The number of iterations to run the algorithm.

upper_val

The upper bound for the tuning parameter.

breaking

The number of iterations to run before checking for convergence.

z_scale

The scaling factor for the summary statistics.

tuning_matrix

A matrix of tuning parameters.

tau_factor

A vector of factors to multiply the median value by to get the tuning parameters.

len_lim_lambda

The number of tuning parameters to use for the first iteration.

sub_tuning

The number of tuning parameters to use for the second iteration.

lim_lambda

The range of tuning parameters to use for the first iteration.

len_lambda

The number of tuning parameters to use for the second iteration.

df_max

The maximum degrees of freedom for the model.

sparse_beta

Whether to use the sparse version of the algorithm.

debug_output

Whether to output the tuning combinations that did not converge.

verbose

Whether to output information through the evaluation of the algorithm.

Value

A named list containing the following elements: beta_matrix: A matrix of the estimated coefficients for each SNP and trait. num_iter_vec: A vector of the number of iterations for each tuning combination. all_tuning_matrix: A matrix of the tuning parameters used for each tuning combination.

Examples

# Load the library and data
library(PANPRSnext)
data("summaryZ")
data("Nvec")
data("plinkLD")

# Take random subset of the data
subset <- sample(nrow(summaryZ), 100)
subset_summary_z <- summaryZ[subset, ]

# Run gsPEN
output <- gsPEN_R(
  summary_z = subset_summary_z,
  n_vec = Nvec,
  plinkLD = plinkLD
)

Main CPP function

Description

Main CPP function

Usage

gsPEN_cpp(
  summary_betas,
  ld_J,
  index_matrix,
  index_J,
  ld_vec,
  SD_vec,
  tuning_matrix,
  dims,
  params
)

Arguments

summary_betas

matrix of summary statistics

ld_J

vector of indices of SNPs in LD with the current SNP

index_matrix

matrix of indices of SNPs in LD with the current SNP

index_J

vector of indices of SNPs in LD with the current SNP

ld_vec

vector of LD values

SD_vec

matrix of SD values

tuning_matrix

matrix of tuning parameters

dims

vector of dimensions

params

vector of parameters


Main CPP function

Description

Main CPP function

Usage

gsPEN_sparse_cpp(
  summary_betas,
  ld_J,
  index_matrix,
  index_J,
  ld_vec,
  SD_vec,
  tuning_matrix,
  dims,
  params
)

Arguments

summary_betas

matrix of summary statistics

ld_J

vector of indices of SNPs in LD with the current SNP

index_matrix

matrix of indices of SNPs in LD with the current SNP

index_J

vector of indices of SNPs in LD with the current SNP

ld_vec

vector of LD values

SD_vec

matrix of SD values

tuning_matrix

matrix of tuning parameters

dims

vector of dimensions

params

vector of parameters


Run the gsfPEN algorithm for multiple traits, with functional annotations.

Description

Run the gsfPEN algorithm for multiple traits, with functional annotations.

Usage

gsfPEN_R(
  summary_z,
  n_vec,
  plinkLD,
  func_index,
  n_iter = 1000,
  upper_val = NULL,
  breaking = 1,
  z_scale = 1,
  tuning_matrix = NULL,
  p_threshold = NULL,
  p_threshold_params = c(0.5, 10^-4, 4),
  tau_factor = c(1/25, 1, 3),
  sub_tuning = 4,
  lim_lambda = c(0.5, 0.9),
  len_lambda = 4,
  lambda_vec = NULL,
  lambda_vec_limit_len = c(1.5, 3),
  df_max = NULL,
  sparse_beta = FALSE,
  debug_output = FALSE,
  verbose = FALSE
)

Arguments

summary_z

A matrix of summary statistics for each SNP and trait.

n_vec

A vector of sample sizes for each of the Q traits corresponding to the Q columns of summary_z.

plinkLD

A matrix of LD values for each pair of SNPs.

func_index

A matrix of functional annotations for each SNP and trait. For the element at i-th row, j-th column, the entry 0 means SNP i without j-th functional annotation; entry 1 means otherwise.

n_iter

The number of iterations to run the algorithm.

upper_val

The upper bound for the tuning parameter.

breaking

The number of iterations to run before checking for convergence.

z_scale

The scaling factor for the summary statistics.

tuning_matrix

A matrix of tuning parameters.

p_threshold

A vector of p-values to use for the tuning parameters.

p_threshold_params

A vector of parameters to use for the p-value tuning parameters.

tau_factor

A vector of factors to multiply the median value by to get the tuning parameters.

sub_tuning

The number of tuning parameters to use for the second iteration.

lim_lambda

The range of tuning parameters to use for the first iteration.

len_lambda

The number of tuning parameters to use for the second iteration.

lambda_vec

A vector of tuning parameters to use for the first iteration.

lambda_vec_limit_len

The number of tuning parameters to use for the first iteration.

df_max

The maximum degrees of freedom for the model.

sparse_beta

Whether to use the sparse version of the algorithm.

debug_output

Whether to output the tuning combinations that did not converge.

verbose

Whether to output information through the evaluation of the algorithm.

Value

A named list containing the following elements: beta_matrix: A matrix of the estimated coefficients for each SNP and trait. num_iter_vec: A vector of the number of iterations for each tuning combination. all_tuning_matrix: A matrix of the tuning parameters used for each tuning combination.

Examples

# Load the library and data
library(PANPRSnext)
data("summaryZ")
data("Nvec")
data("plinkLD")
data("funcIndex")

# Take random subset of the data
subset <- sample(nrow(summaryZ), 100)
subset_summary_z <- summaryZ[subset, ]
subset_func_index <- funcIndex[subset, ]

# Run gsfPEN
output <- gsfPEN_R(
  summary_z = subset_summary_z,
  n_vec = Nvec,
  plinkLD = plinkLD,
  func_index = subset_func_index
)

Main CPP function

Description

Main CPP function

Usage

gsfPEN_cpp(
  summary_betas,
  ld_J,
  index_matrix,
  index_J,
  ld_vec,
  SD_vec,
  tuning_matrix,
  lambda0_vec,
  z_matrix,
  lambda_vec_func,
  func_lambda,
  Ifunc_SNP,
  dims,
  params
)

Arguments

summary_betas

matrix of summary statistics

ld_J

vector of indices of SNPs in LD with the current SNP

index_matrix

matrix of indices of SNPs in LD with the current SNP

index_J

vector of indices of SNPs in LD with the current SNP

ld_vec

vector of LD values

SD_vec

matrix of SD values

tuning_matrix

matrix of tuning parameters

lambda0_vec

vector of lambda0 values

z_matrix

matrix of z values

lambda_vec_func

vector of lambda values

func_lambda

matrix of lambda values

Ifunc_SNP

vector of indices of SNPs in LD with the current SNP

dims

vector of dimensions

params

vector of parameters


Main CPP function

Description

Main CPP function

Usage

gsfPEN_sparse_cpp(
  summary_betas,
  ld_J,
  index_matrix,
  index_J,
  ld_vec,
  SD_vec,
  tuning_matrix,
  lambda0_vec,
  z_matrix,
  lambda_vec_func,
  func_lambda,
  Ifunc_SNP,
  dims,
  params
)

Arguments

summary_betas

matrix of summary statistics

ld_J

vector of indices of SNPs in LD with the current SNP

index_matrix

matrix of indices of SNPs in LD with the current SNP

index_J

vector of indices of SNPs in LD with the current SNP

ld_vec

vector of LD values

SD_vec

matrix of SD values

tuning_matrix

matrix of tuning parameters

lambda0_vec

vector of lambda0 values

z_matrix

matrix of z values

lambda_vec_func

vector of lambda values

func_lambda

matrix of lambda values

Ifunc_SNP

vector of indices of SNPs in LD with the current SNP

dims

vector of dimensions

params

vector of parameters


The LD info from output of the software (plink)

Description

The LD information is crucial for the analysis by SummaryLasso. The reference alleles used to obtained for the Z statsitics or the regression coefficients have to be the sames as those used for the LD calculation. This file can be obtained directly from the output of the LD calculation by the software (plink); for example the output can be like plink.ld. On the other hand, the user can calcuate the LD based on their prefered tools. The variables are as follows:

Usage

data(plinkLD)

Format

A data frame with 205959 rows and 7 columns

References


The Z statistics from the univariate analysis of the association between 3614 SNPs and three traits respectively.

Description

These Z statsitics are obtained from simulated datasets. The variables are as follows:

Usage

data(summaryZ)

Format

A matrix with 3614 rows for the 3614 SNPs and 3 columns for 3 traits.


Run gsPEN on a small sample of the provided data set (Only 100 samples)

Description

Run gsPEN on a small sample of the provided data set (Only 100 samples)

Usage

test_gsPEN(...)

Arguments

...

Additional arguments to pass to gsPEN_R

Value

The output of gsPEN_R


Run gsfPEN on a small sample of the provided data set (Only 100 samples)

Description

Run gsfPEN on a small sample of the provided data set (Only 100 samples)

Usage

test_gsfPEN(...)

Arguments

...

Additional arguments to pass to gsfPEN_R

Value

The output of gsfPEN_R