| Title: | Autocorrelated Conditioned Latin Hypercube Sampling |
| Version: | 1.0.1 |
| Description: | Implementation of the autocorrelated conditioned Latin Hypercube Sampling (acLHS) algorithm for 1D (time-series) and 2D (spatial) data. The acLHS algorithm is an extension of the conditioned Latin Hypercube Sampling (cLHS) algorithm that allows sampled data to have similar correlative and statistical features of the original data. Only a properly formatted dataframe needs to be provided to yield subsample indices from the primary function. For more details about the cLHS algorithm, see Minasny and McBratney (2006), <doi:10.1016/j.cageo.2005.12.009>. For acLHS, see Le and Vargas (2024) <doi:10.1016/j.cageo.2024.105539>. |
| License: | MIT + file LICENSE |
| URL: | https://github.com/vargaslab/acLHS |
| BugReports: | https://github.com/vargaslab/acLHS/issues |
| Depends: | R (≥ 3.5) |
| Imports: | DEoptim (≥ 2.2.8), geoR (≥ 1.9.6), graphics (≥ 4.5.1), stats (≥ 4.5.1), utils (≥ 4.5.1) |
| Suggests: | testthat (≥ 3.0.0) |
| Config/testthat/edition: | 3 |
| Encoding: | UTF-8 |
| LazyData: | true |
| RoxygenNote: | 7.3.3 |
| NeedsCompilation: | no |
| Packaged: | 2025-11-01 01:20:15 UTC; Gabe |
| Author: | Van Huong Le |
| Maintainer: | Gabriel Laboy <glaboy1@asu.edu> |
| Repository: | CRAN |
| Date/Publication: | 2025-11-05 20:10:02 UTC |
Get subsample indices using the acLHS algorithm.
Description
This function extracts a desired number of subsample indices from a dataframe using the acLHS algorithm. The function works for either 1D or 2D data, where it is assumed the last two columns of data are the independent and dependent variables, respectively. Determining the optimal subsamples is done using the DEoptim package, which introduces elements of nondeterminism through randomization. If you desire consistent results, ensure to set a seed before running the function.
Usage
aclhs(
df,
num_samples,
weights,
iter = 1000,
vario_params = aclhs.vario_params(),
export_file = NULL
)
Arguments
df |
A dataframe with three columns of data |
num_samples |
The number of desired subsamples |
weights |
A vector of three weights for each objective function |
iter |
The max number of iterations to perform to find optimized indices |
vario_params |
A list of parameters to use when computing Variograms |
export_file |
The name of a CSV to export subsampled rows to |
Value
A numeric vector of subsample indices of the original data
Examples
## acLHS sampling example
data(ex_data_2D)
input2D <- ex_data_2D
# Set Variogram parameters
v_params <- aclhs.vario_params(num_lags=10, dir=0, tol=90, min_pairs=1)
## Set weights for each objective function, respectively
w <- c(10, 1000, 0.001)
## Run the sampling algorithm
aclhs_samples <- aclhs(df=input2D, num_samples=50, weights=w, iter=100,
vario_params=v_params,
export_file=tempfile(fileext=".csv"))
## Subsample original data
df_sampled <- input2D[aclhs_samples,]
Computes correlations between the original and aclhs-sampled data.
Description
Computes the Pearson, Spearman, and Kendall correlations of the independent and dependent variable of the original and aclhs-sampled data. Correlation values are rounded to the third decimal place.
Usage
aclhs.get_correlations(df, aclhs_samples)
Arguments
df |
The original data in dataframe format |
aclhs_samples |
The acLHS-derived sample indices |
Value
A dataframe with the original and aclhs-sampled correlation values
Examples
## Get the data of interest and get the acLHS sample indices
data(ex_data_2D)
input2D <- ex_data_2D
aclhs_sam <- aclhs(df=input2D, num_samples=50, weights=c(1,1,1), iter=100)
## Compute the correlations
correlations <- aclhs.get_correlations(df=input2D, aclhs_samples=aclhs_sam)
Set parameters for plotting.
Description
Sets various parameters for plotting including plot title, axis labels, plot dimensions and resolution, and whether to add a legend to the plot. By default, a plot will not be created, and the location of where the legend should be placed on the plot should be passed (e.g., "topright").
Usage
aclhs.plot_params(
file_name,
plot_title = "",
xlab = "",
ylab = "",
width = 1000,
height = 1000,
res = 150,
legend = NULL
)
Arguments
file_name |
The name of the file to store the plot in (should end with '.png') |
plot_title |
The title of the plot (default is blank) |
xlab |
The label for the x axis of the plot (default is blank) |
ylab |
The label for the y axis of the plot (default is blank) |
width |
The width of the plot (default is 1000) |
height |
The height of the plot (default is 1000) |
res |
The resolution of the plot (default is 150) |
legend |
The location of the legend on the plot (default is NULL) |
Value
A list of the set plotting parameters
Examples
## Set the parameters
p_params <- aclhs.plot_params(file_name=tempfile(fileext=".png"),
plot_title=expression(bold("Sample Distribution")),
xlab=expression(bold("X [km]")),
ylab=expression(bold("Y [km]")),
legend="topright")
## Access one of the the set parameters
p_params$plot_title
Plots the acLHS samples distribution.
Description
Plots the acLHS sample distribution for either 1D or 2D data. acLHS samples will be overlayed over the original data points in blue.
Usage
aclhs.plot_sampling_distribution(df, aclhs_samples, plot_params)
Arguments
df |
The original data in dataframe format |
aclhs_samples |
The acLHS-derived sample indices |
plot_params |
The plotting parameters to use |
Value
No return value, called for side effects
Examples
## Get the data of interest and get the acLHS sample indices
data(ex_data_2D)
input2D <- ex_data_2D
aclhs_sam <- aclhs(df=input2D, num_samples=50, weights=c(1,1,1), iter=100)
## Set plotting parameters
p_params <- aclhs.plot_params(file_name=tempfile(fileext=".png"),
xlab=expression(bold("X [km]")),
ylab=expression(bold("Y [km]")))
## Create plot
aclhs.plot_sampling_distribution(df=input2D, aclhs_samples=aclhs_sam,
plot_params=p_params)
Plot the scatterplot of the acLHS subsamples.
Description
Plots the acLHS-sampled points of independent and dependent variables of the data as a scatterplot over the original points.
Usage
aclhs.plot_scatterplot(df, aclhs_samples, plot_params)
Arguments
df |
The original data in dataframe format |
aclhs_samples |
The acLHS-derived sample indices |
plot_params |
The plotting parameters to use |
Value
No return value, called for side effects
Examples
#' ## Get the data of interest and get the acLHS sample indices
data(ex_data_2D)
input2D <- ex_data_2D
aclhs_sam <- aclhs(df=input2D, num_samples=50, weights=c(1,1,1), iter=100)
## Set plotting parameters
p_params <- aclhs.plot_params(file_name=tempfile(fileext=".png"),
xlab=expression(bold("Temperature")),
ylab=expression(bold("CO2 Efflux")))
## Create plot
aclhs.plot_scatterplot(df=input2D, aclhs_samples=aclhs_sam,
plot_params=p_params)
Plot the univariate PDF for a column of acLHS-derived samples.
Description
Plots the univariate PDF of acLHS-sampled points over the original univariate PDF data. The PDF can be plotted for either the dependent or independent variable of the original data.
Usage
aclhs.plot_univariate_pdf(df, aclhs_samples, col, plot_params)
Arguments
df |
The original data in dataframe format |
aclhs_samples |
The acLHS-derived sample indices |
col |
The column of data to plot |
plot_params |
The plotting parameters to use |
Value
No return value, called for side effects
Examples
## Get the data of interest and get the acLHS sample indices
data(ex_data_2D)
input2D <- ex_data_2D
aclhs_sam <- aclhs(df=input2D, num_samples=50, weights=c(1,1,1), iter=100)
## Set plotting parameters
p_params <- aclhs.plot_params(file_name=tempfile(fileext=".png"),
xlab=expression(bold("Temperature [Celsius]")),
ylab=expression(bold("Fn(Temperature)")))
## Create plot
aclhs.plot_univariate_pdf(df=input2D, aclhs_samples=aclhs_sam, col=3,
plot_params=p_params)
Plot the Variogram comparison of the acLHS subsamples.
Description
Plots the acLHS-sampled Variogram against the Variogram of the original data. A best-fit curve of the original Variogram is added for clearer comparison.
Usage
aclhs.plot_variogram_comparison(df, aclhs_samples, vario_params, plot_params)
Arguments
df |
The original dataframe |
aclhs_samples |
The acLHS-derived sample indices |
vario_params |
The parameters to set for computing a Variogram |
plot_params |
The plotting parameters to use |
Value
No return value, called for side effects
Examples
#' ## Get the data of interest and get the acLHS sample indices
data(ex_data_2D)
input2D <- ex_data_2D
v_params <- aclhs.vario_params(num_lags=10, dir=0, tol=90, min_pairs=1)
aclhs_sam <- aclhs(df=input2D, num_samples=50, weights=c(1,1,1),
iter=100, vario_params=v_params)
## Set plotting parameters
p_params <- aclhs.plot_params(file_name=tempfile(fileext=".png"),
xlab=expression(bold("Distance [km]")),
ylab=expression(bold("Semivariance")))
## Create plot
aclhs.plot_variogram_comparison(df=input2D, aclhs_samples=aclhs_sam,
vario_params=v_params, plot_params=p_params)
Set parameters for computing a Variogram.
Description
Sets specific parameters for computing Variograms within the acLHS 1D or 2D function calls. Note that the lag value computed for Variograms will always be the 'minimum' of the independent data (i.e., for 1D minimum time between points and for 2D minimum distance between points).
Usage
aclhs.vario_params(num_lags = 8, dir = 0, tol = 90, min_pairs = 1)
Arguments
num_lags |
The number of lags |
dir |
The direction |
tol |
The tolerance |
min_pairs |
The minimum number of pairs |
Value
A list of the set Variogram parameters
Examples
## Store the parameters into a variable
v_params <- aclhs.vario_params(num_lags=10, dir=0, tol=90, min_pairs=1)
## Access one of the the set parameters
v_params$num_lags
Daily CO2 Efflux Measurements within a Temperate Forest
Description
A dataset containing daily CO2 efflux levels as a dependent variable and temperature as an independent variables within a temperate forest for a full year.
Usage
ex_data_1D
Format
ex_data_1D
A data frame with 365 rows and 3 columns:
- Time
The day of the year
- Temp
The temperature
- CO2
The carbon dioxide efflux
Examples
data(ex_data_1D)
Spatial Distribution of Soil CO2 Efflux for CONUS
Description
A dataset containing spatially distributed CO2 efflux levels as a dependent variable and temperature as an independent variables within CONUS.
Usage
ex_data_2D
Format
ex_data_2D
A data frame with 903 rows and 4 columns:
- X
The longitude
- Y
The latitude
- rTemp
The temperature
- rRS
The carbon dioxide efflux
Examples
data(ex_data_2D)
Computes a score from three objective functions.
Description
Computes a score from the sum of three objective functions multiplied by their respective weights. The score is used to determine the best set of indices subsampled by the acLHS algorithm, where lower is better.
Usage
score_samples(
var_samples,
df,
num_samples,
quantile_ind,
corrs,
min_val,
vario_dep,
vario_params,
weights
)
Arguments
var_samples |
Subsampled indices to test |
df |
A dataframe with three columns of data |
num_samples |
The number of subsamples |
quantile_ind |
The quantile of the independent variable in |
corrs |
A vector of three correlations of the two variables in |
min_val |
The minimum time or distance between two points in |
vario_dep |
The computed Variogram of the data |
vario_params |
The parameters to set for computing a Variogram |
weights |
A vector of three weights for each objective function |
Value
Returns the summed score of the weighted objective functions