Soil quality is defined as “the capacity of a specific kind of soil to function within natural or managed ecosystem boundaries, to sustain plant and animal productivity, maintain or enhance water and air quality, and support human health and habitation” (Doran and Parkin 1994).
The Soil Quality Index (SQI) provides a single
numeric score (0–1) that integrates multiple soil physical, chemical,
and biological indicators. SQIpro implements six
established methods for computing SQI, a flexible variable scoring
framework, and publication-quality visualisation tools — all in a single
CRAN-compliant R package.
| Term | Definition |
|---|---|
| Minimum Data Set (MDS) | The smallest subset of soil variables that adequately characterises soil quality (Andrews, Karlen, and Cambardella 2004) |
| Scoring function | A transformation that converts raw variable values to a 0–1 score |
| SQI | Weighted or unweighted combination of variable scores |
| Optimum | The value or range of a variable associated with best soil function |
# From CRAN (once published)
install.packages("SQIpro")
# Development version from GitHub
# install.packages("remotes")
remotes::install_github("yourname/SQIpro")SQIpro ships with soil_data, a hypothetical
dataset of 100 soil samples across 5 land-use systems and 2 depths,
containing 12 soil indicators.
data(soil_data)
dim(soil_data)
#> [1] 100 14
head(soil_data)
#> LandUse Depth pH EC BD CEC OC MBC PMN Clay WHC
#> 1 Natural_Forest Surface_0_15cm 6.44 0.169 1.08 31.6 3.26 416.7 38.5 22.6 56.1
#> 2 Natural_Forest Surface_0_15cm 6.46 0.208 0.93 32.5 3.37 419.3 44.7 23.8 50.2
#> 3 Natural_Forest Surface_0_15cm 6.00 0.240 1.10 27.0 2.57 429.7 40.2 24.8 53.0
#> 4 Natural_Forest Surface_0_15cm 6.38 0.138 1.09 30.9 4.45 506.9 36.3 28.2 57.0
#> 5 Natural_Forest Surface_0_15cm 6.25 0.169 0.92 22.1 3.36 357.8 42.0 24.6 58.9
#> 6 Natural_Forest Surface_0_15cm 6.16 0.241 1.00 32.8 4.42 499.8 28.8 31.4 55.9
#> DEH AP TN
#> 1 115.0 18.3 0.310
#> 2 85.0 14.2 0.351
#> 3 94.4 14.2 0.335
#> 4 127.0 15.9 0.281
#> 5 126.7 21.4 0.296
#> 6 118.0 21.6 0.297table(soil_data$LandUse, soil_data$Depth)
#>
#> Subsurface_15_30cm Surface_0_15cm
#> Agroforestry 10 10
#> Cropland 10 10
#> Degraded_Land 10 10
#> Grassland 10 10
#> Natural_Forest 10 10Always validate before analysis:
result <- validate_data(
soil_data,
group_cols = c("LandUse", "Depth")
)
#>
#> === SQIpro Data Validation ===
#> Data: 100 rows x 14 columns
#> Group columns : LandUse, Depth
#> Numeric variables: 12
#> Groups detected : 10
#>
#> Result: PASS - data ready for SQI computation
result$n_per_group
#> # A tibble: 10 × 3
#> LandUse Depth n
#> <chr> <chr> <int>
#> 1 Agroforestry Subsurface_15_30cm 10
#> 2 Agroforestry Surface_0_15cm 10
#> 3 Cropland Subsurface_15_30cm 10
#> 4 Cropland Surface_0_15cm 10
#> 5 Degraded_Land Subsurface_15_30cm 10
#> 6 Degraded_Land Surface_0_15cm 10
#> 7 Grassland Subsurface_15_30cm 10
#> 8 Grassland Surface_0_15cm 10
#> 9 Natural_Forest Subsurface_15_30cm 10
#> 10 Natural_Forest Surface_0_15cm 10This is the most important step. Each variable receives a scoring type:
"more" — higher is better (e.g.,
organic carbon, microbial biomass)"less" — lower is better (e.g., bulk
density, electrical conductivity)"opt" — a specific optimum range
(e.g., pH 6.0–7.0)"trap" — trapezoidal with explicit
zero boundariescfg <- make_config(
variable = c("pH", "EC", "BD", "OC", "MBC",
"PMN", "Clay", "WHC", "DEH", "AP", "TN"),
type = c("opt", "less", "less", "more", "more",
"more", "opt", "more", "more", "more","more"),
opt_low = c(6.0, NA, NA, NA, NA,
NA, 20, NA, NA, NA, NA),
opt_high = c(7.0, NA, NA, NA, NA,
NA, 35, NA, NA, NA, NA),
description = c(
"Soil pH (H2O 1:2.5)",
"Electrical Conductivity (dS/m)",
"Bulk Density (g/cm3)",
"Organic Carbon (%)",
"Microbial Biomass Carbon (mg/kg)",
"Potentially Mineralizable N (mg/kg)",
"Clay content (%)",
"Water Holding Capacity (%)",
"Dehydrogenase Activity (ug TPF/g/day)",
"Available Phosphorus (mg/kg)",
"Total Nitrogen (%)"
)
)
print(cfg)
#> SQIpro Variable Configuration
#> 11 variable(s) defined
#>
#> variable type opt_low opt_high min_val max_val weight
#> pH opt 6 7 NA NA 1
#> EC less NA NA NA NA 1
#> BD less NA NA NA NA 1
#> OC more NA NA NA NA 1
#> MBC more NA NA NA NA 1
#> PMN more NA NA NA NA 1
#> Clay opt 20 35 NA NA 1
#> WHC more NA NA NA NA 1
#> DEH more NA NA NA NA 1
#> AP more NA NA NA NA 1
#> TN more NA NA NA NA 1
#> description
#> Soil pH (H2O 1:2.5)
#> Electrical Conductivity (dS/m)
#> Bulk Density (g/cm3)
#> Organic Carbon (%)
#> Microbial Biomass Carbon (mg/kg)
#> Potentially Mineralizable N (mg/kg)
#> Clay content (%)
#> Water Holding Capacity (%)
#> Dehydrogenase Activity (ug TPF/g/day)
#> Available Phosphorus (mg/kg)
#> Total Nitrogen (%)Before proceeding, always visualise your scoring curves to confirm biological plausibility:
plot_scoring_curves(soil_data, cfg,
group_cols = c("LandUse", "Depth"))scored <- score_all(soil_data, cfg,
group_cols = c("LandUse", "Depth"))
head(scored[, c("LandUse", "Depth", "pH", "OC", "MBC")])
#> LandUse Depth pH OC MBC
#> 1 Natural_Forest Surface_0_15cm 1 0.7232558 0.8148224
#> 2 Natural_Forest Surface_0_15cm 1 0.7488372 0.8201601
#> 3 Natural_Forest Surface_0_15cm 1 0.5627907 0.8415110
#> 4 Natural_Forest Surface_0_15cm 1 1.0000000 1.0000000
#> 5 Natural_Forest Surface_0_15cm 1 0.7465116 0.6939027
#> 6 Natural_Forest Surface_0_15cm 1 0.9930233 0.9854239mds <- select_mds(scored,
group_cols = c("LandUse", "Depth"),
load_threshold = 0.6)
#>
#> === MDS Selection Summary ===
#> Components retained (eigenvalue > 1): 2
#> Total variance explained: 78.9%
#> MDS variables selected : 2
#> MDS: MBC, AP
mds$mds_vars
#> [1] "MBC" "AP"Equal-weight additive index (Doran and Parkin 1994):
res_lin <- sqi_linear(scored, cfg,
group_cols = c("LandUse", "Depth"),
mds_vars = mds$mds_vars)
print(res_lin)
#> # A tibble: 10 × 4
#> LandUse Depth Raw_score SQI_linear
#> <chr> <chr> <dbl> <dbl>
#> 1 Natural_Forest Surface_0_15cm 0.611 1
#> 2 Cropland Surface_0_15cm 0.549 0.880
#> 3 Agroforestry Surface_0_15cm 0.531 0.847
#> 4 Natural_Forest Subsurface_15_30cm 0.452 0.695
#> 5 Cropland Subsurface_15_30cm 0.451 0.692
#> 6 Grassland Surface_0_15cm 0.398 0.592
#> 7 Agroforestry Subsurface_15_30cm 0.390 0.576
#> 8 Grassland Subsurface_15_30cm 0.303 0.409
#> 9 Degraded_Land Surface_0_15cm 0.0982 0.0148
#> 10 Degraded_Land Subsurface_15_30cm 0.0905 0Variables weighted by stepwise regression coefficients (Masto et al. 2008):
res_reg <- sqi_regression(scored, cfg,
dep_var = "OC",
group_cols = c("LandUse", "Depth"),
mds_vars = mds$mds_vars)
print(res_reg)
#> # A tibble: 10 × 3
#> LandUse Depth SQI_regression
#> <chr> <chr> <dbl>
#> 1 Natural_Forest Surface_0_15cm 1
#> 2 Agroforestry Surface_0_15cm 0.716
#> 3 Natural_Forest Subsurface_15_30cm 0.660
#> 4 Grassland Surface_0_15cm 0.651
#> 5 Agroforestry Subsurface_15_30cm 0.412
#> 6 Grassland Subsurface_15_30cm 0.377
#> 7 Cropland Surface_0_15cm 0.350
#> 8 Cropland Subsurface_15_30cm 0.206
#> 9 Degraded_Land Surface_0_15cm 0.05
#> 10 Degraded_Land Subsurface_15_30cm 0Variables weighted by variance explained (Andrews, Karlen, and Cambardella 2004; Bastida et al. 2008):
res_pca <- sqi_pca(scored, cfg,
group_cols = c("LandUse", "Depth"),
mds = mds)
print(res_pca)
#> # A tibble: 10 × 3
#> LandUse Depth SQI_pca
#> <chr> <chr> <dbl>
#> 1 Natural_Forest Surface_0_15cm 1
#> 2 Agroforestry Surface_0_15cm 0.745
#> 3 Natural_Forest Subsurface_15_30cm 0.668
#> 4 Grassland Surface_0_15cm 0.638
#> 5 Cropland Surface_0_15cm 0.465
#> 6 Agroforestry Subsurface_15_30cm 0.448
#> 7 Grassland Subsurface_15_30cm 0.384
#> 8 Cropland Subsurface_15_30cm 0.311
#> 9 Degraded_Land Surface_0_15cm 0.0424
#> 10 Degraded_Land Subsurface_15_30cm 0Arithmetic or geometric mean of fuzzy membership scores:
res_fuz <- sqi_fuzzy(scored, cfg,
group_cols = c("LandUse", "Depth"),
mds_vars = mds$mds_vars,
operator = "mean")
print(res_fuz)
#> # A tibble: 10 × 3
#> LandUse Depth SQI_fuzzy
#> <chr> <chr> <dbl>
#> 1 Natural_Forest Surface_0_15cm 1
#> 2 Cropland Surface_0_15cm 0.880
#> 3 Agroforestry Surface_0_15cm 0.847
#> 4 Natural_Forest Subsurface_15_30cm 0.695
#> 5 Cropland Subsurface_15_30cm 0.692
#> 6 Grassland Surface_0_15cm 0.592
#> 7 Agroforestry Subsurface_15_30cm 0.576
#> 8 Grassland Subsurface_15_30cm 0.409
#> 9 Degraded_Land Surface_0_15cm 0.0148
#> 10 Degraded_Land Subsurface_15_30cm 0Objective weights derived from Shannon entropy (Shannon 1948):
res_ent <- sqi_entropy(scored, cfg,
group_cols = c("LandUse", "Depth"),
mds_vars = mds$mds_vars)
print(res_ent)
#> # A tibble: 10 × 3
#> LandUse Depth SQI_entropy
#> <chr> <chr> <dbl>
#> 1 Natural_Forest Surface_0_15cm 1
#> 2 Agroforestry Surface_0_15cm 0.81
#> 3 Cropland Surface_0_15cm 0.729
#> 4 Natural_Forest Subsurface_15_30cm 0.685
#> 5 Grassland Surface_0_15cm 0.608
#> 6 Cropland Subsurface_15_30cm 0.554
#> 7 Agroforestry Subsurface_15_30cm 0.529
#> 8 Grassland Subsurface_15_30cm 0.400
#> 9 Degraded_Land Surface_0_15cm 0.0248
#> 10 Degraded_Land Subsurface_15_30cm 0
cat("\nEntropy weights:\n")
#>
#> Entropy weights:
print(attr(res_ent, "entropy_weights"))
#> MBC AP
#> 0.6011 0.3989Multi-criteria ranking by ideal solution distance (Hwang and Yoon 1981):
res_top <- sqi_topsis(scored, cfg,
group_cols = c("LandUse", "Depth"),
mds_vars = mds$mds_vars)
print(res_top)
#> # A tibble: 10 × 3
#> LandUse Depth SQI_topsis
#> <chr> <chr> <dbl>
#> 1 Natural_Forest Surface_0_15cm 0.661
#> 2 Agroforestry Surface_0_15cm 0.613
#> 3 Cropland Surface_0_15cm 0.590
#> 4 Natural_Forest Subsurface_15_30cm 0.512
#> 5 Cropland Subsurface_15_30cm 0.497
#> 6 Grassland Surface_0_15cm 0.453
#> 7 Agroforestry Subsurface_15_30cm 0.423
#> 8 Grassland Subsurface_15_30cm 0.314
#> 9 Degraded_Land Surface_0_15cm 0.0374
#> 10 Degraded_Land Subsurface_15_30cm 0.0253comparison <- sqi_compare(scored, cfg,
group_cols = c("LandUse", "Depth"),
dep_var = "OC",
mds = mds)
print(comparison)
#> # A tibble: 10 × 10
#> LandUse Depth SQI_linear SQI_pca SQI_fuzzy SQI_entropy SQI_topsis
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Natural_Forest Surface_0… 1 1 1 1 0.661
#> 2 Agroforestry Surface_0… 0.847 0.745 0.847 0.81 0.613
#> 3 Natural_Forest Subsurfac… 0.695 0.668 0.695 0.685 0.512
#> 4 Cropland Surface_0… 0.880 0.465 0.880 0.729 0.590
#> 5 Grassland Surface_0… 0.592 0.638 0.592 0.608 0.453
#> 6 Agroforestry Subsurfac… 0.576 0.448 0.576 0.529 0.423
#> 7 Cropland Subsurfac… 0.692 0.311 0.692 0.554 0.497
#> 8 Grassland Subsurfac… 0.409 0.384 0.409 0.400 0.314
#> 9 Degraded_Land Surface_0… 0.0148 0.0424 0.0148 0.0248 0.0374
#> 10 Degraded_Land Subsurfac… 0 0 0 0 0.0253
#> # ℹ 3 more variables: SQI_regression <dbl>, Mean_SQI <dbl>, Rank <int>plot_scores(scored, cfg,
group_cols = c("LandUse", "Depth"),
group_by = "LandUse",
facet_by = "Depth")plot_sqi(res_lin,
sqi_col = "SQI_linear",
group_col = "LandUse",
fill_col = "Depth")# Requires the 'fmsb' package: install.packages("fmsb")
plot_radar(scored, cfg,
group_col = "LandUse",
group_cols = c("LandUse", "Depth"))plot_pca_biplot(mds, scored, group_col = "LandUse")# Compute per-observation index
scored$SQI_obs <- rowMeans(scored[, mds$mds_vars], na.rm = TRUE)
aov_res <- sqi_anova(scored,
sqi_col = "SQI_obs",
group_col = "LandUse")
cat("ANOVA significant:", aov_res$significant, "\n")
#> ANOVA significant: TRUE
print(aov_res$compact_letters)
#> Group Letter
#> 1 Agroforestry a
#> 2 Cropland a
#> 3 Degraded_Land b
#> 4 Grassland c
#> 5 Natural_Forest asa <- sqi_sensitivity(scored, cfg,
group_cols = c("LandUse", "Depth"),
method = "linear",
mds_vars = mds$mds_vars)
print(sa)
#> variable mean_change sd_change relative_importance
#> 1 MBC 0.1830 0.1569 1.0000
#> 2 AP 0.1354 0.1046 0.7399plot_sensitivity(sa)raw data
│
▼
validate_data() ← check structure, missing values, groups
│
▼
make_config() ← define scoring type per variable
│
▼
plot_scoring_curves() ← verify biological plausibility
│
▼
score_all() ← transform all variables to 0–1
│
▼
select_mds() ← PCA-based minimum data set selection
│
├──► sqi_linear()
├──► sqi_regression()
├──► sqi_pca()
├──► sqi_fuzzy()
├──► sqi_entropy()
└──► sqi_topsis()
│
▼
sqi_compare() ← unified results table + ranking
│
▼
Visualisation ← plot_sqi(), plot_scores(), plot_radar()
│
▼
Statistics ← sqi_anova(), sqi_sensitivity()