| Type: | Package |
| Title: | Phasing, Pedigree Reconstruction, Sire Imputation and Recombination Events Identification of Half-sib Families Using SNP Data |
| Version: | 3.0.0 |
| Date: | 2026-2-15 |
| Depends: | snowfall, R (≥ 3.5.0) |
| LinkingTo: | RcppArmadillo (≥ 0.4.300.8.0), Rcpp (≥ 0.11.2) |
| Imports: | Rcpp, gdata |
| Description: | Identification of recombination events, haplotype reconstruction, sire imputation and pedigree reconstruction using half-sib family SNP data. |
| License: | GPL-3 |
| NeedsCompilation: | yes |
| Repository: | CRAN |
| RoxygenNote: | 7.3.3 |
| Packaged: | 2026-02-17 10:19:52 UTC; mhf |
| Author: | Mohammad Ferdosi [aut, cre], Cedric Gondro [aut] |
| Maintainer: | Mohammad Ferdosi <mhferdosi@yahoo.com> |
| Date/Publication: | 2026-02-17 10:50:02 UTC |
Phasing, Pedigree Reconstruction, Sire Imputation and Recombination Events for Half-sib Families
Description
Identification of recombination events, haplotype reconstruction and sire imputation using half-sib family SNP data.
Details
| Package: | hsphase |
| Type: | Package |
| Version: | 3.0.0 |
| Date: | 2026-02-15 |
| License: | GPL-3 |
Main functions:
bmh: Block partitioning
ssp: Sire inference
aio: Phasing
imageplot: Image plot of the block structure
rpoh: Reconstruct pedigree based on opposing homozygotes
Auxiliary functions:
hss: Half-sib family splitter
cs: Chromosome splitter
para: Parallel data analysis
Author(s)
Mohammad H. Ferdosi mferdosi@une.udu.au, Cedric Gondro cgondro2@une.edu.au
Maintainer: Mohammad H. Ferdosi mhferdosi@yahoo.com
References
Ferdosi, M. H., Kinghorn, B. P., van der Werf, J. H., & Gondro, C. (2013).
Effect of genotype and pedigree error on detection of recombination events, sire imputation and haplotype inference using the hsphase algorithm.
In Proc. Assoc. Advmt. Anim. Breed. Genet (Vol. 20, pp. 546–549). AAABG; Napier, New Zealand.
Ferdosi, M. H., Kinghorn, B. P., van der Werf, J. H. J., & Gondro, C. (2014).
Detection of recombination events, haplotype reconstruction and imputation of sires using half-sib SNP genotypes.
Genetics Selection Evolution, 46(1), 11.
Ferdosi, M. H., Kinghorn, B. P., van der Werf, J. H. J., Lee, S. H., & Gondro, C. (2014).
hsphase: an R package for pedigree reconstruction, detection of recombination events, phasing and imputation of half-sib family groups.
BMC Bioinformatics, 15(1), 172.
Ferdosi, M. H., & Boerner, V. (2014).
A fast method for evaluating opposing homozygosity in large SNP data sets.
Livestock Science.
Sahoo S., Ferdosi M.H., van der Werf J.H.J., and de las Heras-Saldana S., et al. (2025) Proc. Assoc. Advmt. Anim. Breed. Genet. 26: 323
Examples
genotype <- matrix(c(
0,0,0,0,1,2,2,2,0,0,2,0,0,0,
2,2,2,2,1,0,0,0,2,2,2,2,2,2,
2,2,2,2,1,2,2,2,0,0,2,2,2,2,
2,2,2,2,0,0,0,0,2,2,2,2,2,2,
0,0,0,0,0,2,2,2,2,2,2,0,0,0
), ncol = 14, byrow = TRUE)
ssp(bmh(genotype), genotype)
aio(genotype)
imageplot(bmh(genotype), title = "ImagePlot example")
rplot(genotype, 1:14)
Calculate Genotypic Distances
Description
Calculates a symmetric matrix of distances (canberra) between genotypes, based on a given genotype matrix. Each row in the 'GenotypeMatrix' represents a genotype, and each column represents a marker.
Usage
.fastdist(GenotypeMatrix)
Arguments
GenotypeMatrix |
A matrix where each row represents a genotype and each column represents a marker. Genotypes should be coded as 0 for AA, 1 for AB, and 2 for BB, with 9 representing missing data. |
Value
Returns a symmetric matrix of distances (canberra) between the genotypes specified in the 'GenotypeMatrix'. Row and column names of the returned matrix correspond to the row names of the 'GenotypeMatrix'.
Examples
# Simulate genotype data for 40 individuals across 1000 SNPs
genotypes <- .simulateHalfsib(numInd = 5, numSNP = 1000, recbound = 0:6, type = "genotype")
# Calculate the distance matrix
dist_matrix <- hsphase::.fastdist(genotypes)
print(dist_matrix)
Fill gaps in a paternal strand vector (native routine wrapper)
Description
Internal wrapper around the native C routine fillGap. It fills
zero-valued gaps in a paternal-strand vector by propagating values from
neighbouring non-zero positions (see native implementation for exact rules).
Usage
.fillGap(paternalStrandBMH)
Arguments
paternalStrandBMH |
An integer (or numeric) vector representing the paternal strand state (e.g., output from block-building routines). The vector is coerced to integer before calling native code. |
Details
This function calls compiled code via .C:
.C("fillGap", ...).
Value
An integer vector of the same length as paternalStrandBMH
containing the gap-filled strand values.
Author(s)
mhf
See Also
Fix conflicting allele assignments on both haplotype strands
Description
Internal helper that scans a haplotype/strand matrix in *pairs of rows* (two rows per individual/segment) and detects loci where **both rows in a pair are non-zero**. Such positions are treated as conflicting assignments and are replaced with the sentinel value '100' on *both* rows of the pair.
Usage
.fixBothStrand(groups)
Arguments
groups |
A numeric matrix with an **even number of rows**. Rows are
interpreted in pairs: rows |
Details
This function is a thin R wrapper around the native routine
fixBothStrand implemented in C++ and called via .Call().
The sentinel value '100' is used downstream as an indicator of an unresolvable conflict between the two strands at that position.
Value
A numeric matrix with the same dimensions as groups. For each
row-pair and column:
if both entries are non-zero, both are set to '100';
otherwise, the original values are preserved.
See Also
aio, ssp, bmh for the main
hsphase workflow where strand/haplotype matrices are produced and refined.
Fix strand label rotation across consecutive block-structure columns
Description
Internal helper to enforce a consistent strand-label orientation across adjacent columns of a block-structure matrix.
Usage
.fixRotation(blockStructure)
Arguments
blockStructure |
A numeric matrix (typically individuals in rows, SNPs in columns) representing a block/strand structure. Values are expected to be small integers (commonly including '0', '1', '2' and possibly other internal codes). |
Details
The input typically encodes sire strand-of-origin labels per individual (rows) and marker/SNP (columns), where '0' indicates unknown and non-zero values indicate an assigned strand/state. The native algorithm compares each column to the previous one and, when a "contrast" (swap of strand labels) increases agreement, it relabels the next column to reduce apparent strand-rotation between columns.
This function is a thin R wrapper around the native routine
fixRotation implemented in C++ and called via .Call().
At each step, the C++ code computes an agreement score between column
i and column i+1 using only positions where both columns are
non-zero. It also computes the score after applying a contrast mapping to
column i+1 (conceptually swapping strand labels '1' and '2', leaving
'0' unchanged). If the contrasted version agrees more with column i,
the function relabels column i+1.
The relabeling performed by the native code is:
'1 -> 3'
'2 -> 1'
'3 -> 2'
leaving other values unchanged. (These codes are part of hsphase's internal block/strand encoding.)
Value
A numeric matrix with the same dimensions as blockStructure,
where some entries in column i+1 may be relabeled to improve
consistency with column i. The transformation is applied iteratively
from left to right across columns.
See Also
bmh, ssp, aio for creation
and downstream usage of block structures.
Build haplotype blocks from a BMH result matrix (native routine wrapper)
Description
Internal wrapper around the native C routine hblock. It transforms a
BMH result matrix into a block representation, with an optional maximum block
size constraint.
Usage
.hblock(bmhResult, MaxBlock = 400)
Arguments
bmhResult |
A numeric/integer matrix containing BMH results (block matching/haplotype-block intermediate output). Must be a matrix. |
MaxBlock |
Integer scalar. Maximum block size (default: 400). |
Details
This function transposes and flattens bmhResult before passing it to
compiled code via .C:
.C("hblock", ...).
Value
A matrix (same general shape as bmhResult) containing inferred
block structure. Row and column names are propagated from bmhResult
where available.
Cluster individuals by local IBD-like similarity in sliding windows
Description
Internal helper that applies a sliding-window clustering procedure across a
genotype matrix. For each window of markers, individuals are grouped using
an opposing-homozygote style criterion (controlled by maxOH) via a
native C++ routine.
Usage
.ibdCluster(genotype, cpus, windowsSize, maxOH)
Arguments
genotype |
A numeric matrix of genotypes with individuals in rows and markers/SNPs in columns. This function treats genotype code '1' (heterozygote) as missing by converting it to '9' prior to calling the native routine. |
cpus |
Integer scalar. Requested number of CPU threads. The underlying C++ implementation uses OpenMP; the actual number of threads may depend on how R and your compiler toolchain were built and configured. |
windowsSize |
Integer scalar giving the window size (in markers). Must
be between 1 and |
maxOH |
Integer scalar. Maximum allowed opposing-homozygote count (or a similar mismatch threshold) used by the native grouping algorithm within each window. |
Details
This function is a thin R wrapper around the native routine
ibdCluster implemented in C++ and called via .Call().
The function scans windows of consecutive markers. For each window starting
at marker i, it clusters individuals based on the submatrix
genotype[, i:(i+windowsSize-1)] after recoding heterozygotes ('1') to
missing ('9'). Computation is parallelized in C++ using OpenMP across window
start positions.
Value
A list (R "list" object) of length ncol(genotype) - (windowsSize - 1).
Each element corresponds to one sliding window position and contains an
integer vector of group assignments produced by the native implementation.
(Exact encoding of groups is defined by the underlying mhMat grouping
routine.)
Calculate minor allele frequency (MAF)
Description
Calculates the minor allele frequency (MAF) for a single SNP coded as: 0 = AA, 1 = AB, 2 = BB, and 9 = missing.
Usage
.maf(snp)
Arguments
snp |
A numeric vector of genotypes for one SNP. Values must be 0, 1, 2, or 9 (missing). |
Value
A single numeric value: the minor allele frequency (MAF).
Examples
snp_data <- c(0, 0, 1, 2, 2, 9)
.maf(snp_data)
Convert a one-row haplotype to a two-row haplotype
Description
Converts a haplotype matrix where each individual is represented by one row and alleles are stored in alternating columns (1st allele, 2nd allele, ...) into a two-row-per-individual representation.
Usage
.o2tH(haplotype)
Arguments
haplotype |
A haplotype object:
|
Details
Internally, any allele code of '2' is converted to '0' before conversion.
Value
An integer matrix in a two-row-per-individual format with
2 * nrow(haplotype) rows and ncol(haplotype) / 2 columns.
Row names are interleaved using the original individual names.
Phase half-sib paternal haplotype using blocks and sire haplotypes (no offspring genotype needed)
Description
Internal helper that constructs a half-sib paternal haplotype matrix using:
a block/strand-of-origin matrix (typically produced by
bmh)a 2-row phased sire haplotype matrix (typically produced by
ssp)
Usage
.phfnoGenotype(blockMatrix, sirePhasedMatrix)
Arguments
blockMatrix |
An integer/numeric matrix of block assignments with individuals in rows and markers in columns. Must contain only '0', '1', and '2', where '0' indicates unknown origin. |
sirePhasedMatrix |
An integer/numeric matrix with **two rows** (the sire
haplotypes) and the same number of columns as |
Details
For each marker (column) and individual (row), if the block code is:
'1': assign sire haplotype row 1 allele at that marker
'2': assign sire haplotype row 2 allele at that marker
'0': leave as missing ('9')
This function calls a native C routine (phaseNogenotype) via
.C().
The underlying C implementation initializes the entire result matrix to '9'
and then fills entries according to blockMatrix:
if
blockMatrix[j,i] == 1, thenresult[j,i] = sirePhasedMatrix[1,i]if
blockMatrix[j,i] == 2, thenresult[j,i] = sirePhasedMatrix[2,i]
Value
An integer matrix with the same dimensions as blockMatrix,
containing the inferred paternal haplotype allele for each individual and
marker. Values are '0'/'1' for alleles and '9' for missing/unknown (e.g. where
blockMatrix is '0').
Calus-style recursive clustering of individuals using an OH matrix
Description
Performs a recursive hierarchical clustering on an opposing-homozygotes (OH) matrix to split individuals into two groups at each step (Ward clustering), until within-group OH values fall below a threshold derived from allele frequencies estimated from the genotype matrix.
Usage
.prCalus(oh, genotype)
Arguments
oh |
A numeric matrix representing the opposing-homozygotes (OH) counts between individuals. Row and column names should be individual IDs. The matrix is expected to be square and symmetric. |
genotype |
A numeric genotype matrix of dimension |
Details
The function returns a two-column data frame containing individual IDs and an
assigned group code. Group codes are generated randomly (via rnorm())
and therefore are not stable across runs.
The threshold maxsnpnooh is computed from per-SNP minor allele
frequencies (.maf) and then reduced by 10%. The recursion proceeds as:
Compute pairwise distances from
ohusing.fastdistand convert to adistobject.Apply hierarchical clustering (
hclustwithmethod = "ward.D").Cut the dendrogram into
k = 2groups.For each group, compute the maximum within-group OH value; if it exceeds
maxsnpnoohand group size is > 2, recurse into that subgroup. Otherwise, write group assignments to a temporary file and stop recursion.
Value
A data.frame with columns:
- id
Individual ID (character).
- group
An integer-like group code (generated randomly; not reproducible).
Side effects
This function writes to and reads from a file named "temp.txt" in the
current working directory, and then deletes it.
See Also
Manual recursive clustering using an OH matrix and a fixed threshold
Description
Performs a recursive hierarchical clustering on an opposing-homozygotes (OH)
matrix using Ward clustering and splits clusters until the maximum within-
cluster OH value is below a user-supplied threshold (maxsnpnooh).
Usage
.prManual(oh, maxsnpnooh)
Arguments
oh |
A numeric matrix representing opposing-homozygotes (OH) counts between individuals. Row and column names should be individual IDs. The matrix is expected to be square and symmetric. |
maxsnpnooh |
Numeric scalar giving the maximum allowed within-group OH value. Groups with a larger maximum OH value (and size > 2) are split recursively. |
Details
The function returns a two-column data frame with individual IDs and a group
code. Group codes are generated randomly (via rnorm()) and therefore
are not stable across runs.
The recursion proceeds as follows:
Compute a pairwise distance object from
ohusing.fastdistand convert it to adistobject.Apply hierarchical clustering using
hclustwithmethod = "ward.D".Cut the dendrogram into two groups using
cutree.For each group, compute the maximum within-group OH value; if it exceeds
maxsnpnoohand the group has more than two individuals, recurse into that subgroup. Otherwise, write group assignments and stop.
Value
A data.frame with columns:
- id
Individual ID (character).
- group
An integer-like group code (generated randomly; not reproducible).
Side effects
This function writes to and reads from a file named "temp.txt" in the
current working directory, and then deletes it. It also prints
maxsnpnooh to the console.
See Also
Simple recursive clustering using an OH matrix with a linear threshold rule
Description
Performs a recursive hierarchical clustering on an opposing-homozygotes (OH)
matrix using Ward clustering. Clusters are split until the maximum within-
cluster OH value is below a threshold computed from the number of SNPs
(snpNooh) using a linear rule.
Usage
.prSimple(oh, snpNooh, intercept = 26.3415, coefficient = 77.3171)
Arguments
oh |
A numeric matrix representing opposing-homozygotes (OH) counts between individuals. Row and column names should be individual IDs. The matrix is expected to be square and symmetric. |
snpNooh |
Numeric scalar. Number of SNPs used for OH calculation (or a proxy for SNP density) used to derive the stopping threshold. |
intercept |
Numeric scalar. Intercept for the linear threshold rule. |
coefficient |
Numeric scalar. Slope for the linear threshold rule. |
Details
The threshold is computed as:
maxsnpnooh = (intercept + coefficient * snpNooh) - 15 * snpNooh
The function returns a two-column data frame with individual IDs and a group
code. Group codes are generated randomly (via rnorm()) and therefore
are not stable across runs.
The recursion proceeds as follows:
Compute a distance object from
ohusing.fastdistand convert it to adistobject.Apply hierarchical clustering using
hclustwithmethod = "ward.D".Cut the dendrogram into two groups using
cutree.For each group, compute the maximum within-group OH value; if it exceeds
maxsnpnoohand the group has more than two individuals, recurse into that subgroup. Otherwise, write group assignments and stop.
Value
A data.frame with columns:
- id
Individual ID (character).
- group
An integer-like group code (generated randomly; not reproducible).
Side effects
This function writes to and reads from a file named "temp.txt" in the
current working directory, and then deletes it.
See Also
Convert a two-row haplotype per individual to a one-row representation
Description
Converts a haplotype matrix where each individual is represented by two rows (allele 1 and allele 2) into a single-row-per-individual representation.
Usage
.ptr2por(haplotype)
Arguments
haplotype |
A matrix containing haplotypes with two rows per individual. |
Value
A matrix with one row per individual (allele 1 and allele 2 combined).
Reconstruct half-sib groups by recursive clustering with a recombination stop rule
Description
Internal helper used by rpoh (reconstruction pedigree of half-sib
families). This function recursively splits individuals into two clusters using
hierarchical clustering on a distance derived from the provided opposing
homozygote (OH) matrix, and then decides whether each cluster should be split
further by checking the maximum number of recombination events inferred within
that cluster.
Usage
.rpohhsphase(
genotypeMatrix,
oh,
forwardVectorSize = 30,
excludeFP = TRUE,
nsap = 3,
maxRec = 15
)
Arguments
genotypeMatrix |
Numeric genotype matrix (individuals in rows, SNPs in columns) coded as '0', '1', '2' (and typically '9' for missing), as used by hsphase. This matrix is subset recursively when splitting clusters. |
oh |
A square opposing-homozygote matrix for the same individuals as
|
forwardVectorSize |
Integer. Passed to |
excludeFP |
Logical. Passed to |
nsap |
Integer. Passed to |
maxRec |
Integer. Maximum allowed recombination count (within a cluster) before the cluster is recursively split again. |
Details
The recursive splitting stops for a cluster when the maximum recombination
count in that cluster is <= maxRec. Final group assignments are written
to a temporary file and then read back as a two-column data frame.
The algorithm:
Converts
ohto a distance object viaas.dist(.fastdist(oh))and performs hierarchical clustering (hclust, Ward method).Splits into
k = 2clusters viacutree.For each cluster with at least 4 individuals, computes recombination counts as
recombinations(bmh(subGenotype, ...))and uses the maximum recombination count as a stop/split criterion.If
max(recombinations) > maxRec, the cluster is split again recursively; otherwise, individuals in that cluster are assigned a new group label and written to a temporary file.
Value
A data.frame with two columns:
-
id: individual IDs -
group: an integer-like group label assigned by the recursive procedure
Implementation notes
This function uses a fixed temporary filename
"temp.txt"in the current working directory and deletes it at the end. This is not safe under parallel execution or if the working directory is not writable.Group labels are generated using
rnorm(), so results are not deterministic unless a seed is set and the recursion order remains identical.
See Also
rpoh, ohg, bmh,
recombinations, .fastdist
Simulate Half-Sibling Genotypes
Description
This function simulates genotypes for a set of half-siblings based on specified parameters, including the number of individuals, the number of SNPs, recombination boundaries, and the type of data to return. It generates a sire genotype, maternal half-sib genotypes, and combines these to simulate offspring genotypes, optionally returning phased genotypes based on recombination events.
Usage
.simulateHalfsib(
numInd = 40,
numSNP = 10000,
recbound = 0:6,
type = "genotype"
)
Arguments
numInd |
Integer, the number of half-siblings to simulate. |
numSNP |
Integer, the number of SNPs to simulate for each individual. |
recbound |
Numeric vector, specifying the range of possible recombination events to simulate. |
type |
Character string, specifying the type of data to return: "genotype" for genotypic data or any other string for phased genotypic data. |
Value
Depending on the type parameter, this function returns a matrix of simulated genotypic data
for half-siblings. If type is "genotype", it returns unphased genotypic data; otherwise, it returns phased genotypic data.
Examples
sim_genotypes <- .simulateHalfsib(numInd = 40, numSNP = 10000, recbound = 0:6, type = "genotype")
dim(sim_genotypes) # Should return 40 rows (individuals) and 100 columns (SNPs)
Add Switches
Description
Add switch points to haplotypes by swapping the two haplotype rows for an individual from each switch point to the end of the chromosome.
Usage
addSwitch(haplotypeMatrix, switchPoints, minLength)
Arguments
haplotypeMatrix |
|
switchPoints |
|
minLength |
|
Details
Important: Each switch point causes a swap of the two haplotype rows for that individual from the switch position to the end.
The switchPoints list must have one element per individual (not per haplotype row).
If an element is 0, no switch is applied for that individual.
If you rely on minLength to ignore nearby switches, verify your installed version enforces this rule.
Value
A matrix of the same dimension as haplotypeMatrix with switches applied.
See Also
groupMatSingle and fixSW
Examples
haplotype <- matrix(c(0, 0, 0, 0,
1, 1, 1, 1,
0, 0, 1, 1,
1, 1, 0, 0,
1, 1, 1, 1,
0, 0, 0, 0), byrow = TRUE, nrow = 6)
switchPoints <- list(firstInd = c(2), secondInd = c(1, 3), lastInd = 0)
addSwitch(haplotype, switchPoints, 0)
All-in-one Phasing
Description
Phasing of a single half-sib family group (single ordered chromosome).
Usage
aio(genotypeMatrix, bmh_forwardVectorSize = 30, bmh_excludeFP = TRUE,
bmh_nsap = 3, bmh_fillMissing = FALSE, output = "phase")
Arguments
genotypeMatrix |
|
bmh_forwardVectorSize |
|
bmh_excludeFP |
|
bmh_nsap |
|
bmh_fillMissing |
|
output |
|
Details
This function calls bmh, ssp, and phf.
Value
If output = "phase", returns a haplotype matrix with two rows per individual (first paternal, second maternal),
coded as 0 (allele A), 1 (allele B), and 9 (missing/unphased).
Otherwise returns a list with elements:
-
phasedHalfsibs -
sireHaplotype -
blockStructure
Note
Only this function needs to be called to phase a half-sib family. The genotype matrix must contain individuals from a single family and a single ordered chromosome.
See Also
Examples
genotype <- matrix(c( # Define a Half-sib Genotype Matrix
2,1,0, # Individual 1
2,0,0, # Individual 2
0,0,2 # Individual 3
), byrow = TRUE, ncol = 3) # There are 3 individulas with three SNPs
aio(genotype) # The genotypes must include only one half-sib family and one chromosome
Block Partitioning
Description
Identifies the block structure (chromosome segments) in a half-sib family that each individual inherited from its sire.
Usage
bmh(
GenotypeMatrix,
forwardVectorSize = 30,
excludeFP = TRUE,
nsap = 3,
fillMissing = FALSE
)
Arguments
GenotypeMatrix |
|
forwardVectorSize |
|
excludeFP |
|
nsap |
|
fillMissing |
|
Value
A matrix of block structure containing 1, 2, and 0.
Values 1 and 2 represent the two sire strands (arbitrary labeling within a chromosome),
and 0 indicates unknown origin.
Note
The genotype matrix must contain individuals from only one half-sib family and one ordered chromosome.
See Also
Examples
genotype <- matrix(c(
0,2,1,1,1,
2,0,1,2,2,
2,2,1,0,2,
2,2,1,1,1,
0,0,2,1,0
), ncol = 5, byrow = TRUE)
bmh(genotype)
Crossover Detection
Description
Detects possible crossover segments by comparing pairs of individuals in a half-sib family.
Usage
co(genotypeMatrix)
Arguments
genotypeMatrix |
|
Value
A matrix Returns a matrix with the number of crossover events for each site.
Examples
genotype <- matrix(c(
2,1,0,
2,0,2,
0,0,2
), byrow = TRUE, ncol = 3)
co(genotype)
Chromosome Splitter
Description
Splits the genotype list generated by hss into chromosomes based on a map file/data.frame and orders SNPs by chromosomal position.
Usage
cs(halfsib, mapPath, separator = " ")
Arguments
halfsib |
|
mapPath |
|
separator |
|
Details
The map file should include only the chromosomes that will be analyzed. For example, the Y and X chromosomes should be excluded (and others optionally). Names of each element in the list can be used for further categorization. The header must be "Name Chr Position".
Value
Returns a list of matrices, the number of elements in this list is the number of half-sib families multiplied by the number of chromosomes.
Examples
# Please run demo(hsphase)
Fixing Switch Errors
Description
Fix switch errors in haplotypes for a half-sib family.
Usage
fixSW(haplotype, ohMax = 0, windowsSize = 100, minLength = 100, cpus = 2)
Arguments
haplotype |
|
ohMax |
|
windowsSize |
|
minLength |
|
cpus |
|
Value
A haplotype matrix with switch errors corrected.
See Also
Examples
haplotype <- .simulateHalfsib(7, 2500, type = "haplotype")$phased
switches <- list(500,0,0,1200,c(1000,2000),500,1200)
haplotype2 <- addSwitch(haplotype, switches, 0)
gMat <- groupMatSingle(haplotype2, 100, 2, "haplotype")
imageplot(gMat, title = "Before fixing switches")
haplotype3 <- fixSW(haplotype2, 0, 100, 100)
gMat2 <- groupMatSingle(haplotype3, 100, 2, "haplotype")
imageplot(gMat2, title = "After fixing switches")
Example Genotype Data Set
Description
An example genotype matrix for the hsphase package.
Usage
data(genotypes)
Format
A genotype matrix with:
-
Columns: SNPs
-
Rows: animals/individuals
Genotype coding follows the package conventions (typically 0, 1, 2 and 9 for missing).
References
Sahoo S., Ferdosi M.H., van der Werf J.H.J., and de las Heras-Saldana S., et al. (2025) Proc. Assoc. Advmt. Anim. Breed. Genet. 26: 323
Grouping a Half-sib Family
Description
Group the genotype or haplotype of a half-sib family into partitions using opposing homozygotes.
Usage
groupMatSingle(haplotype, windowsSize, cpus = 2, input = "haplotype", oh = 0)
Arguments
haplotype |
|
windowsSize |
|
cpus |
|
input |
|
oh |
|
Value
A grouping matrix.
See Also
Examples
haplotype <- .simulateHalfsib(10, 5000, type = "haplotype")$phased
gMat <- groupMatSingle(haplotype, 100, 2, "haplotype")
imageplot(gMat)
Haplotype Blocks of Phased Data
Description
Creates a block-structure matrix for a half-sib family based on phased data of the sire and the half-sib family.
Usage
hbp(PhasedGenotypeMatrix, PhasedSireGenotype, strand = "auto")
Arguments
PhasedGenotypeMatrix |
|
PhasedSireGenotype |
|
strand |
|
Value
A matrix in which 3 or 4 indicates the SNP originates from, respectively, sire strand 1 or strand 2.
0 indicates the origin is unknown.
Note
The input matrices must contain individuals from a single half-sib family and a single ordered chromosome. The SNP order must match between inputs.
See Also
Examples
sire <- matrix(c(
0,0,0,0,0,1, # Haplotype one of the sire
0,1,1,1,1,0 # Haplotype two of the sire
), byrow = TRUE, ncol = 6)
haplotypeHalfsib <- matrix(c(
1,0,1,1,1,1, # Individual one, haplotype one
0,1,0,0,0,0, # Individual one, haplotype two
0,1,1,0,1,1, # Individual two, haplotype one
1,0,0,1,0,0 # Individual two, haplotype two
), byrow = TRUE, ncol = 6) # 0s and 1s are alelle a and b
hbp(haplotypeHalfsib, sire)
Heatmap of Half-sibs
Description
Creates a heatmap of a half-sib dataset using an opposing-homozygotes (OH) matrix, with optional sidebars showing inferred and/or real pedigree groupings.
Usage
hh(oh, inferredPedigree, realPedigree, pedOnly = TRUE)
Arguments
oh |
|
inferredPedigree |
|
realPedigree |
|
pedOnly |
|
Value
Returns a heatmap of the OH matrix with sidebars color-coded by sire groups from the inferred and original pedigrees (where provided).
Author(s)
The function uses colors generated by the getcol function in the made4 package (Aedin Culhane).
See Also
Examples
c1h1 <- .simulateHalfsib(numInd = 62, numSNP = 5000)
c1h2 <- .simulateHalfsib(numInd = 38, numSNP = 5000)
Genotype <- rbind(c1h1, c1h2)
oh <- ohg(Genotype)
hh(oh)
Half-sib Family Splitter
Description
Splits the dataset into half-sib family groups based on a pedigree.
Usage
hss(pedigree, genotype, minHS = 4, check = TRUE)
Arguments
pedigree |
|
genotype |
|
minHS |
|
check |
|
Details
Only half-sib groups that have more than 3 individuals will be returned.
Value
Returns a list of numeric matrices, each matrix is a half-sib family.
Note
Pedigree must have at least two columns with sample ids (Column 1) and sire ids (Column 2).
Examples
# Please run demo(hsphase)
Image Plot of Blocking Structure
Description
Create an image plot of the blocking structure.
Usage
imageplot(x, title = c(), rv = FALSE, ...)
Arguments
x |
|
title |
|
rv |
|
... |
Optional graphical parameters. |
Details
White indicates regions of unknown origin; red and blue correspond to the two sire strands.
Author(s)
This is a modified version of a function written by Chris Seidel, available at http://www.phaget4.org/R/image_matrix.html.
See Also
Examples
genotype <- matrix(c(
0,2,1,1,1,
2,0,1,2,2,
2,2,1,0,2,
2,2,1,1,1,
0,0,2,1,0
), ncol = 5, byrow = TRUE)
imageplot(bmh(genotype))
Impute of Low Density SNP Marker to High Density (Paternal Strand)
Description
Impute the paternal strand from low density to high density utilising high density sire haplotype.
Usage
impute(halfsib_genotype_ld, sire_hd, bmh_forwardVectorSize = 30,
bmh_excludeFP = TRUE, bmh_nsap = 3)
Arguments
halfsib_genotype_ld |
|
sire_hd |
|
bmh_forwardVectorSize |
|
bmh_excludeFP |
|
bmh_nsap |
|
Value
Return an imputed half-sib matrix.
See Also
Example Map File
Description
An example map dataset for the hsphase package.
Usage
data(map)
Format
A data.frame with the following columns:
- Name
SNP identifier
- Chr
Chromosome
- Position
SNP position in base pairs
References
Sahoo S., Ferdosi M.H., van der Werf J.H.J., and de las Heras-Saldana S., et al. (2025) Proc. Assoc. Advmt. Anim. Breed. Genet. 26: 323
Opposing Homozygote Detection
Description
Counts, for each animal, the number of loci where it contributes opposing homozygotes in sites that imply heterozygosity in the sire.
Usage
ohd(genotypeMatrix, unique_check = FALSE, SNPs = 6000)
Arguments
genotypeMatrix |
|
unique_check |
|
SNPs |
|
Value
A numeric vector with the number of heterozygous sites that each sample caused.
Note
This function can be used to identify pedigree errors; i.e., outliers with unusually high values.
Author(s)
This method was suggested by Bruce Tier <btier@une.edu.au> to identify pedigree errors.
Examples
genotype <- matrix(c(
2,1,0,
2,0,0,
0,0,2
), byrow = TRUE, ncol = 3)
ohd(genotype)
Matrix of Opposing Homozygotes
Description
Creates a matrix of pairwise opposing-homozygote (OH) counts from a genotype matrix.
Usage
ohg(genotypeMatrix)
Arguments
genotypeMatrix |
|
Value
Returns a square matrix (sample \times sample) of pairwise counts of opposing homozygotes.
(Some versions may return this matrix inside a named list element.)
Note
This function can be slow for large datasets.
Author(s)
Ferdosi, M. H., & Boerner, V. (2014). A fast method for evaluating opposing homozygosity in large SNP data sets. Livestock Science.
See Also
Examples
genotype <- matrix(c(
2,1,0,
2,0,0,
0,0,2
), byrow = TRUE, ncol = 3)
ohg(genotype)
Opposing Homozygotes Plot
Description
Plot the sorted vectorized matrix of Opposing Homozygotes.
Usage
ohplot(oh, genotype, pedigree, check = FALSE)
Arguments
oh |
|
genotype |
|
pedigree |
|
check |
|
Details
The cut off line shows the edge of most different groups.
See Also
Examples
set.seed(100)
chr <- list()
sire <- list()
set.seed(1)
chr <- list()
for(i in 1:5)
{
chr[[i]] <- .simulateHalfsib(numInd = 20, numSNP = 5000, recbound = 1:10)
sire[[i]] <- ssp(bmh(chr[[i]]), chr[[i]])
sire[[i]] <- sire[[i]][1,] + sire[[i]][2,]
sire[[i]][sire[[i]] == 18] <- 9
}
Genotype <- do.call(rbind, chr)
rownames(Genotype) <- 6:(nrow(Genotype) + 5)
sire <- do.call(rbind, sire)
rownames(sire) <- 1:5
Genotype <- rbind(sire, Genotype)
oh <- ohg(Genotype) # creating the Opposing Homozygote matrix
pedigree <- as.matrix(data.frame(c(1:5, 6:(nrow(Genotype))),
rep = c(rep(0,5), rep(1:5, rep(20,5)))))
ohplot(oh, Genotype, pedigree, check = TRUE)
Parallel Analysis of Data
Description
This function uses the list of matrices (the output of cs) and runs one of the options, on each element of the list, in parallel.
Usage
para(halfsibs, cpus = 1, option = "bmh", type = "SOCK", bmh_forwardVectorSize = 30,
bmh_excludeFP = TRUE, bmh_nsap = 3, bmh_fillMissing = FALSE,pmMethod = "constant")
Arguments
halfsibs |
|
cpus |
|
option |
|
type |
|
bmh_forwardVectorSize |
|
bmh_excludeFP |
|
bmh_nsap |
|
bmh_fillMissing |
|
pmMethod |
|
Details
Type of analysis can be bmh, ssp, aio, pm, or rec (refer to pm, rplot and vignette for more information about rec).
Value
Returns a list of matrices with the results (formats specific to the option selected).
Examples
# Please run demo(hsphase)
Example Pedigree
Description
An example pedigree dataset for the hsphase package.
Usage
data(pedigree)
Format
A data.frame with the following columns:
- First column
Half-sib (offspring) IDs
- Second column
Sire IDs
References
Sahoo S., Ferdosi M.H., van der Werf J.H.J., and de las Heras-Saldana S., et al. (2025) Proc. Assoc. Advmt. Anim. Breed. Genet. 26: 323
Fix Pedigree Errors
Description
Tries to link the inferred pedigree from rpoh with sire IDs in the original pedigree and fix pedigree errors.
Usage
pedigreeNaming(inferredPedigree, realPedigree)
Arguments
inferredPedigree |
|
realPedigree |
|
Details
This function calls bmh and recombinations to count the number of recombinations in each half-sib group.
Value
Returns the inferred pedigree with the best match to sire names used in the original pedigree file.
See Also
Examples
# Please run demo(hsphase)
Half-sib Family Phasing
Description
Phases a half-sib family using the block structure and an imputed sire haplotype matrix.
Usage
phf(GenotypeMatrix, blockMatrix, sirePhasedMatrix)
Arguments
GenotypeMatrix |
|
blockMatrix |
|
sirePhasedMatrix |
|
Value
Returns a matrix containing the phased parental haplotypes of the half-sibs (two rows per individual).
Alleles are coded as 0 (A), 1 (B), and 9 (missing/unphased).
Note
The genotype matrix must contain individuals from one half-sib family and one ordered chromosome.
This function is used by aio for complete phasing of a half-sib group.
See Also
Examples
genotype <- matrix(c(
2,1,0,
2,0,0,
0,0,2
), byrow = TRUE, ncol = 3)
block <- bmh(genotype)
phf(genotype, block, ssp(block, genotype))
Probability Matrix
Description
Creates a recombination (probability) matrix based on the blocking structure.
Usage
pm(blockMatrix, method = "constant")
Arguments
blockMatrix |
|
method |
|
Details
This function identifies recombination between two consecutive sites and marks recombination sites with 1.
If there are unknown sites between two blocks, it marks these sites with:
-
1for the"constant"method, or -
1 / mfor the"relative"method, wheremis the number of unknown sites.
Examples
genotype <- matrix(c(
0,2,0,1,0,
2,0,1,2,2,
2,2,1,0,2,
2,2,1,1,1,
0,0,2,1,0
), ncol = 5, byrow = TRUE)
block <- bmh(genotype)
pm(block)
Parent-Offspring Group Constructor
Description
Assign offspring to parents based on an opposing-homozygotes (OH) matrix.
Usage
pogc(oh, genotypeError)
Arguments
oh |
|
genotypeError |
|
Value
A data.frame with two columns:
animal ID
assigned parent ID
See Also
Examples
set.seed(1)
chr <- list()
sire <- list()
for(i in 1:5)
{
chr[[i]] <- .simulateHalfsib(numInd = 20, numSNP = 5000, recbound = 1:10)
sire[[i]] <- ssp(bmh(chr[[i]]), chr[[i]])
sire[[i]] <- sire[[i]][1,] + sire[[i]][2,]
sire[[i]][sire[[i]] == 18] <- 9
}
Genotype <- do.call(rbind, chr)
rownames(Genotype) <- 6:(nrow(Genotype) + 5)
sire <- do.call(rbind, sire)
rownames(sire) <- 1:5
Genotype <- rbind(sire, Genotype)
oh <- ohg(Genotype)
pogc(oh, 5)
Read and Check the Genotype File
Description
Reads a genotype file and optionally checks it for common formatting/data issues.
Usage
readGenotype(genotypePath, separatorGenotype = " ", check = TRUE)
Arguments
genotypePath |
|
separatorGenotype |
|
check |
|
Value
A genotype matrix.
Note
Please refer to the vignette for more information.
Recombination Number
Description
Counts the number of recombinations for each individual based on the block structure.
Usage
recombinations(blockMatrix)
Arguments
blockMatrix |
|
Value
A numeric vector of recombination counts with length equal to the number of individuals (rows) in blockMatrix.
See Also
Examples
genotype <- matrix(c(
2,1,0,0,
2,0,2,2,
0,0,2,2,
0,2,0,0
), byrow = TRUE, ncol = 4)
recombinations(bmh(genotype))
Recombination Plot
Description
Creates a plot showing the sum of recombination events across a half-sib family.
Usage
rplot(x, distance, start = 1, end = ncol(x), maximum = 100,
overwrite = FALSE, method = "constant")
Arguments
x |
|
distance |
|
start |
|
end |
|
maximum |
|
overwrite |
|
method |
|
Examples
genotype <- matrix(c(
0,2,0,1,0,
2,0,1,2,2,
2,2,1,0,2,
2,2,1,1,1,
0,0,2,1,0
), ncol = 5, byrow = TRUE)
rplot(genotype, c(1,2,3,4,8))
Reconstruct Pedigree Based on Matrix of Opposing Homozygotes
Description
Reconstructs a half-sib pedigree based on a matrix of opposing homozygotes.
Usage
rpoh(genotypeMatrix, oh, forwardVectorSize = 30, excludeFP = TRUE, nsap = 3,
maxRec = 15, intercept = 26.3415, coefficient = 77.3171, snpnooh, method, maxsnpnooh)
Arguments
genotypeMatrix |
|
oh |
|
forwardVectorSize |
|
excludeFP |
|
nsap |
|
maxRec |
|
intercept |
|
coefficient |
|
snpnooh |
|
method |
|
maxsnpnooh |
|
Details
Four methods simple, recombinations, calus and manual can be
utilized to reconstruct the pedigree.
The following examples show the arguments require for each method.
pedigree1 <- rpoh(oh = oh, snpnooh = 732, method = "simple")
pedigree2 <- rpoh(genotypeMatrix = genotypeChr1, oh = ohg(genotype), maxRec = 10 , method = "recombinations")
pedigree3 <- rpoh(genotypeMatrix = genotype, oh = oh, method = "calus")
pedigree4 <- rpoh(oh = oh, maxsnpnooh = 31662, method = "manual")
Value
Returns a data frame with two columns, the first column is animals' ID and the second column is sire identifiers (randomly generated).
Note
Method can be recombinations, simple, calus or manual. Please refer to vignette for more information.
The sire genotype should be removed before using this function utilizing pogc function.
See Also
bmh and recombinations
Examples
# Please run demo(hsphase)
Sire Imputation and Phasing
Description
Infers (imputes) and phases the sire haplotypes based on the block structure matrix and homozygous sites of the half-sib genotype matrix.
Usage
ssp(blockMatrix, genotypeMatrix)
Arguments
blockMatrix |
|
genotypeMatrix |
|
Value
A matrix with two rows (one per sire haplotype) and columns corresponding to SNPs in genotype order.
Alleles are coded as 0 (A) and 1 (B). Alleles that could not be imputed are coded as 9.
See Also
Examples
genotype <- matrix(c(
0,2,1,1,1,
2,0,1,2,2,
2,2,1,0,2,
2,2,1,1,1,
0,0,2,1,0
), ncol = 5, byrow = TRUE)
ssp(bmh(genotype), genotype)
Switch Detector
Description
Detect switch errors in the haplotypes of a half-sib family.
Usage
switchDetector(groupMatrix)
Arguments
groupMatrix |
|
Value
A list of integer vectors. The list length equals the number of individuals.
Each vector contains the locations of detected switch errors for that individual.
See Also
Examples
haplotype <- .simulateHalfsib(8, 3000, type = "haplotype")$phased
switches <- list(2500,0,0,1200,c(1000,2000),500,2000,0)
haplotype2 <- addSwitch(haplotype, switches, 0)
gMat <- groupMatSingle(haplotype2, 100, 2, "haplotype")
switchDetector(gMat)