Title: Iterative Pruning Population Admixture Inference Framework
Version: 0.1.2
Description: A data clustering package based on admixture ratios (Q matrix) of population structure. The framework is based on iterative Pruning procedure that performs data clustering by splitting a given population into subclusters until meeting the condition of stopping criteria the same as ipPCA, iNJclust, and IPCAPS frameworks. The package also provides a function to retrieve phylogeny tree that construct a neighbor-joining tree based on a similar matrix between clusters. By given multiple Q matrices with varying a number of ancestors (K), the framework define a similar value between clusters i,j as a minimum number K* that makes majority of members of two clusters are in the different clusters. This K* reflexes a minimum number of ancestors we need to splitting cluster i,j into different clusters if we assign K* clusters based on maximum admixture ratio of individuals. The publication of this package is at Chainarong Amornbunchornvej, Pongsakorn Wangkumhang, and Sissades Tongsima (2020) <doi:10.1101/2020.03.21.001206>.
Depends: R (≥ 3.5.0)
Imports: stats,treemap,ape
URL: https://github.com/DarkEyes/ipADMIXTURE
BugReports: https://github.com/DarkEyes/ipADMIXTURE/issues
Language: en-US
License: GPL-3
Encoding: UTF-8
LazyData: true
Suggests: knitr, rmarkdown
VignetteBuilder: knitr
RoxygenNote: 7.3.2
NeedsCompilation: no
Packaged: 2025-05-06 08:50:24 UTC; zero
Author: Chainarong Amornbunchornvej ORCID iD [aut, cre]
Maintainer: Chainarong Amornbunchornvej <grandca@gmail.com>
Repository: CRAN
Date/Publication: 2025-05-06 09:50:01 UTC

A list of Q matrices of simulation of 20 populations

Description

A dataset containing admixture ratios of 1200 individuals from 20 simulation populations where the number of ancestors ranges from 2 to 18. This dataset was the result of running LEA library developed by Frichot, E., & François, O. (2015). LEA: An R package for landscape and ecological association studies. Methods in Ecology and Evolution, 6(8), 925-929. on the 20-simulation-population dataset published by Limpiti, T., et al. (2014). iNJclust: iterative neighbor-joining tree clustering framework for inferring population structure. IEEE/ACM transactions on computational biology and bioinformatics, 11(5), 903-914.

Usage

UD1_Qmat

Format

A list of Q matrices of 1200 individuals from 20 populations. There are Q matrices that have the number of ancestors ranges from from 2 to 18.

UD1_Qmat

It is list of Q matrices that contains admixture ratios of 1200 individuals from the 20-population dataset. UD1_Qmat[[k]][i,j] is the admixture ratio of jth ancestor for ith individual in the (k+1)-ancestor Q matrix.

...


Labels of 20 simulation populations

Description

Labels of 20 simulation populations

Usage

UD1labels

Format

Labels of 20 populations. :

UD1labels

It is a vector of labels of 1200 individuals. There are 20 populations.

...


biclustFunc function

Description

biclustFunc is a binary clustering function using hierarchical clustering.

Usage

biclustFunc(Qmat, admixRatioThs = 0.5, method = "average")

Arguments

Qmat

is a Q matrix that contains admixture ratios of all individuals where the Qmat[i,j] represents the admixture ratio of ancestor j for individual i.

admixRatioThs

is a threshold to determine that if a cluster has maxDiffAdmixRatio lower than threshold, then the cluster is a homogeneous cluster.

method

is a method parameter of hclust object for hierarchical clustering analysis. The default is "average".

Value

This function returns binary clustering results.

heteroFlag

is a flag that represents a status whether a given cluster is heterogeneous (having sub-clusters). It is TRUE if maxDiffAdmixRatio >= admixRatioThs.

clusterInx

is a vector of clustering assignment where indexClsVec[i] is a cluster number of individual i.

meanDiffAdmixRatio

is a vector of magnitude-difference of admixture ratios. It is calculated by splitting a given cluster into two sub-clusters. Then, we take the absolute on the difference between mean admixture ratios of sub-clusters.

Qmat1

is a Q matrix of sub-cluster #1 after splitting a given cluster into two sub-clusters that contains admixture ratios of all individuals where the Qmat[i,j] represents the admixture ratio of ancestor j for individual i.

Qmat2

is a Q matrix of sub-cluster #2 after splitting a given cluster into two sub-clusters that contains admixture ratios of all individuals where the Qmat[i,j] represents the admixture ratio of ancestor j for individual i.

maxDiffAdmixRatio

is a maximum of magnitude-difference of admixture ratios for a given cluster before splitting into two sub-clusters.

Examples

# Running biclustFunc on Q matrix of 27 human population dataset where K = 12
obj<-biclustFunc(Qmat=ipADMIXTURE::human27pop_Qmat[[11]], admixRatioThs =0.15)


getPhyloTree

Description

getPhyloTree is function that reports a phylogenetic tree of clusters based on admixture analysis. The phylogeny tree that construct a neighbor-joining tree based on a similar matrix between clusters. By given multiple Q matrices with varying a number of ancestors (K), the framework define a similar value between clusters i,j as a minimum number K that makes majority of members of two clusters are in the different ancestor groups. This K reflexes a minimum number of ancestors we need to splitting cluster i,j into different clusters if we assign K clusters based on maximum admixture ratio of individuals.

Usage

getPhyloTree(QmatList, indexClsVec)

Arguments

QmatList

is list of Q matrix where QmatList[[k]] is a Q matrix with k+1 ancestors.

indexClsVec

is a vector of clustering assignment where indexClsVec[i] is a cluster number of individual i.

Value

This function returns an object of nj tree as well as a matrix minDiffAncestorClsMat that is used as a similarity matrix.

tree

is an object of nj tree calculated by ape::nj() function on a dissimilarity version of minDiffAncestorClsMat.

minDiffAncestorClsMat

is a minimum-ancestor-number matrix in the group level where minDiffAncestorClsMat[i,j] is a minimum number of ancestors that make i and j to be different ancestor groups while minDiffAncestorClsMat[i,j]-1 makes majority of members from i and j belong to the same ancestor group.

minDiffAncestorMat

is a minimum-ancestor-number matrix in the individual level where minDiffAncestorMat[i,j] is a minimum number of ancestors that make i and j to be different ancestor groups

Examples

# Running ipADMIXTURE on Q matrices (K=2-12) of 27 human population dataset.
h27pop_obj<-ipADMIXTURE(Qmat=ipADMIXTURE::human27pop_Qmat[[11]], admixRatioThs =0.15)
out<-ipADMIXTURE::getPhyloTree(ipADMIXTURE::human27pop_Qmat,h27pop_obj$indexClsVec)
plot(out$tree)


A list of Q matrices of 27 human populations

Description

A dataset containing admixture ratios of 544 individuals from 27 human populations where the number of ancestors ranges from 2 to 12. This dataset was the result of running ADMIXTURE software developed by Zhou, H., et al. (2011). A quasi-Newton acceleration for high-dimensional optimization algorithms. Statistics and computing, 21(2), 261-273. on the 27-human-population dataset published by Xing, J., Watkins, W. S. et al. (2009). Fine-scaled human genetic structure revealed by SNP microarrays. Genome research, 19(5), 815-825.

Usage

human27pop_Qmat

Format

A list of Q matrices of 544 individuals from 27 human populations. There are 2-12 ancestors in the list.

human27pop_Qmat

It is list of Q matrices that contains admixture ratios of 544 individuals from the 27 population human dataset. human27pop_Qmat[[k]][i,j] is the admixture ratio of jth ancestor for ith individual in the (k+1)-ancestor Q matrix.

...


Labels of 27 human populations

Description

Labels of 27 human populations

Usage

human27pop_labels

Format

Labels of 27 human populations. :

human27pop_labels

It is a vector of labels of 544 individuals. There are 27 populations.

...


Iterative Pruning Population Admixture Inference Framework (ipADMIXTURE)

Description

A data clustering package based on admixture ratios (Q matrix) of population structure.

The framework is based on iterative Pruning procedure that performs data clustering by splitting a given population into subclusters until meeting the condition of stopping criteria the same as ipPCA, iNJclust, and IPCAPS frameworks. The package also provides a function to retrieve phylogeny tree that construct a neighbor-joining tree based on a similar matrix between clusters. By given multiple Q matrices with varying a number of ancestors (K), the framework define a similar value between clusters i,j as a minimum number K that makes majority of members of two clusters are in the different clusters. This K reflexes a minimum number of ancestors we need to splitting cluster i,j into different clusters if we assign K clusters based on maximum admixture ratio of individuals.

Usage

ipADMIXTURE(Qmat, admixRatioThs, method = "average")

Arguments

Qmat

is a Q matrix that contains admixture ratios of all individuals where the Qmat[i,j] represents the admixture ratio of ancestor j for individual i.

admixRatioThs

is a threshold to determine that if a cluster has maxDiffAdmixRatio lower than threshold, then the cluster is a homogeneous cluster.

method

is a method parameter of hclust object for hierarchical clustering analysis. The default is "average".

Value

This function returns clustering results in a form of an object of ipADMIXTURE class. The object contains the following items.

indexClsVec

is a vector of clustering assignment where indexClsVec[i] is a cluster number of individual i.

homoClusters

is a list of cluster objects where each object contains member indices, cluster's maxDiffAdmixRatio, ID, etc.

maxDiffAdmixRatioVec

is a vector of maxDiffAdmixRatios for all clusters.

Qmat

is a Q matrix that contains admixture ratios of all individuals where the Qmat[i,j] represents the admixture ratio of ancestor j for individual i.

admixRatioThs

is a threshold to determine that if a cluster has maxDiffAdmixRatio lower than threshold, then the cluster is a homogeneous cluster.

Author(s)

Chainarong Amornbunchornvej, chai@ieee.org

Examples

# Running ipADMIXTURE on Q matrix of 27 human population dataset where K = 12
h27pop_obj<-ipADMIXTURE(Qmat=ipADMIXTURE::human27pop_Qmat[[11]], admixRatioThs =0.15)


plotAdmixClusters

Description

plotAdmixClusters is function that plots admixture ratios where the x axis represents individuals with cluster labels and y axis represents admixture ratios.

Usage

plotAdmixClusters(obj)

Arguments

obj

is an object of ipADMIXTURE class.

Examples

h27pop_obj<-ipADMIXTURE(Qmat=ipADMIXTURE::human27pop_Qmat[[11]], admixRatioThs =0.15)
ipADMIXTURE::plotAdmixClusters(h27pop_obj)


plotClusterLeaves

Description

plotClusterLeaves is function that plots clusters in a form of treemap plot. Subsquares represent clusters. Each subsquare contains cluster label (ID), number of members (N), and a maximum of manitude-difference of admixture ratios (md). A size of each subsquare represents a ratio of member numbers compared to other clusters. A color represents an md value of cluster.

Usage

plotClusterLeaves(obj)

Arguments

obj

is an object of ipADMIXTURE class.

Examples

h27pop_obj<-ipADMIXTURE(Qmat=ipADMIXTURE::human27pop_Qmat[[11]], admixRatioThs =0.15)
ipADMIXTURE::plotClusterLeaves(h27pop_obj)


printClustersFromLabels

Description

printClustersFromLabels is function that reports that clustering results in text mode.

Usage

printClustersFromLabels(obj, labels)

Arguments

obj

is an object of ipADMIXTURE class.

labels

is a vector of labels of all individuals.

Examples

h27pop_obj<-ipADMIXTURE(Qmat=ipADMIXTURE::human27pop_Qmat[[11]], admixRatioThs =0.15)
ipADMIXTURE::printClustersFromLabels(h27pop_obj,ipADMIXTURE::human27pop_labels)