Getting Started with SQIpro: Comprehensive Soil Quality Index

Your Name

2026-04-13

1 Introduction

Soil quality is defined as “the capacity of a specific kind of soil to function within natural or managed ecosystem boundaries, to sustain plant and animal productivity, maintain or enhance water and air quality, and support human health and habitation” (Doran and Parkin 1994).

The Soil Quality Index (SQI) provides a single numeric score (0–1) that integrates multiple soil physical, chemical, and biological indicators. SQIpro implements six established methods for computing SQI, a flexible variable scoring framework, and publication-quality visualisation tools — all in a single CRAN-compliant R package.

1.1 Key Concepts

Term Definition
Minimum Data Set (MDS) The smallest subset of soil variables that adequately characterises soil quality (Andrews, Karlen, and Cambardella 2004)
Scoring function A transformation that converts raw variable values to a 0–1 score
SQI Weighted or unweighted combination of variable scores
Optimum The value or range of a variable associated with best soil function

2 Installation

# From CRAN (once published)
install.packages("SQIpro")

# Development version from GitHub
# install.packages("remotes")
remotes::install_github("yourname/SQIpro")

3 The Example Dataset

SQIpro ships with soil_data, a hypothetical dataset of 100 soil samples across 5 land-use systems and 2 depths, containing 12 soil indicators.

data(soil_data)
dim(soil_data)
#> [1] 100  14
head(soil_data)
#>          LandUse          Depth   pH    EC   BD  CEC   OC   MBC  PMN Clay  WHC
#> 1 Natural_Forest Surface_0_15cm 6.44 0.169 1.08 31.6 3.26 416.7 38.5 22.6 56.1
#> 2 Natural_Forest Surface_0_15cm 6.46 0.208 0.93 32.5 3.37 419.3 44.7 23.8 50.2
#> 3 Natural_Forest Surface_0_15cm 6.00 0.240 1.10 27.0 2.57 429.7 40.2 24.8 53.0
#> 4 Natural_Forest Surface_0_15cm 6.38 0.138 1.09 30.9 4.45 506.9 36.3 28.2 57.0
#> 5 Natural_Forest Surface_0_15cm 6.25 0.169 0.92 22.1 3.36 357.8 42.0 24.6 58.9
#> 6 Natural_Forest Surface_0_15cm 6.16 0.241 1.00 32.8 4.42 499.8 28.8 31.4 55.9
#>     DEH   AP    TN
#> 1 115.0 18.3 0.310
#> 2  85.0 14.2 0.351
#> 3  94.4 14.2 0.335
#> 4 127.0 15.9 0.281
#> 5 126.7 21.4 0.296
#> 6 118.0 21.6 0.297
table(soil_data$LandUse, soil_data$Depth)
#>                 
#>                  Subsurface_15_30cm Surface_0_15cm
#>   Agroforestry                   10             10
#>   Cropland                       10             10
#>   Degraded_Land                  10             10
#>   Grassland                      10             10
#>   Natural_Forest                 10             10

4 Step 1: Validate Your Data

Always validate before analysis:

result <- validate_data(
  soil_data,
  group_cols = c("LandUse", "Depth")
)
#> 
#> === SQIpro Data Validation ===
#>   Data: 100 rows x 14 columns
#>   Group columns   : LandUse, Depth
#>   Numeric variables: 12
#>   Groups detected : 10
#> 
#> Result: PASS - data ready for SQI computation
result$n_per_group
#> # A tibble: 10 × 3
#>    LandUse        Depth                  n
#>    <chr>          <chr>              <int>
#>  1 Agroforestry   Subsurface_15_30cm    10
#>  2 Agroforestry   Surface_0_15cm        10
#>  3 Cropland       Subsurface_15_30cm    10
#>  4 Cropland       Surface_0_15cm        10
#>  5 Degraded_Land  Subsurface_15_30cm    10
#>  6 Degraded_Land  Surface_0_15cm        10
#>  7 Grassland      Subsurface_15_30cm    10
#>  8 Grassland      Surface_0_15cm        10
#>  9 Natural_Forest Subsurface_15_30cm    10
#> 10 Natural_Forest Surface_0_15cm        10

5 Step 2: Define Variable Configuration

This is the most important step. Each variable receives a scoring type:

cfg <- make_config(
  variable    = c("pH",   "EC",   "BD",   "OC",   "MBC",
                  "PMN",  "Clay", "WHC",  "DEH",  "AP",  "TN"),
  type        = c("opt",  "less", "less", "more", "more",
                  "more", "opt",  "more", "more", "more","more"),
  opt_low     = c(6.0,   NA,     NA,     NA,     NA,
                  NA,     20,     NA,     NA,     NA,    NA),
  opt_high    = c(7.0,   NA,     NA,     NA,     NA,
                  NA,     35,     NA,     NA,     NA,    NA),
  description = c(
    "Soil pH (H2O 1:2.5)",
    "Electrical Conductivity (dS/m)",
    "Bulk Density (g/cm3)",
    "Organic Carbon (%)",
    "Microbial Biomass Carbon (mg/kg)",
    "Potentially Mineralizable N (mg/kg)",
    "Clay content (%)",
    "Water Holding Capacity (%)",
    "Dehydrogenase Activity (ug TPF/g/day)",
    "Available Phosphorus (mg/kg)",
    "Total Nitrogen (%)"
  )
)
print(cfg)
#> SQIpro Variable Configuration
#>   11 variable(s) defined
#> 
#>  variable type opt_low opt_high min_val max_val weight
#>        pH  opt       6        7      NA      NA      1
#>        EC less      NA       NA      NA      NA      1
#>        BD less      NA       NA      NA      NA      1
#>        OC more      NA       NA      NA      NA      1
#>       MBC more      NA       NA      NA      NA      1
#>       PMN more      NA       NA      NA      NA      1
#>      Clay  opt      20       35      NA      NA      1
#>       WHC more      NA       NA      NA      NA      1
#>       DEH more      NA       NA      NA      NA      1
#>        AP more      NA       NA      NA      NA      1
#>        TN more      NA       NA      NA      NA      1
#>                            description
#>                    Soil pH (H2O 1:2.5)
#>         Electrical Conductivity (dS/m)
#>                   Bulk Density (g/cm3)
#>                     Organic Carbon (%)
#>       Microbial Biomass Carbon (mg/kg)
#>    Potentially Mineralizable N (mg/kg)
#>                       Clay content (%)
#>             Water Holding Capacity (%)
#>  Dehydrogenase Activity (ug TPF/g/day)
#>           Available Phosphorus (mg/kg)
#>                     Total Nitrogen (%)

5.0.1 Verify scoring curves

Before proceeding, always visualise your scoring curves to confirm biological plausibility:

plot_scoring_curves(soil_data, cfg,
                    group_cols = c("LandUse", "Depth"))


6 Step 3: Score All Variables

scored <- score_all(soil_data, cfg,
                    group_cols = c("LandUse", "Depth"))
head(scored[, c("LandUse", "Depth", "pH", "OC", "MBC")])
#>          LandUse          Depth pH        OC       MBC
#> 1 Natural_Forest Surface_0_15cm  1 0.7232558 0.8148224
#> 2 Natural_Forest Surface_0_15cm  1 0.7488372 0.8201601
#> 3 Natural_Forest Surface_0_15cm  1 0.5627907 0.8415110
#> 4 Natural_Forest Surface_0_15cm  1 1.0000000 1.0000000
#> 5 Natural_Forest Surface_0_15cm  1 0.7465116 0.6939027
#> 6 Natural_Forest Surface_0_15cm  1 0.9930233 0.9854239

7 Step 4: Select the Minimum Data Set (MDS)

mds <- select_mds(scored,
                  group_cols     = c("LandUse", "Depth"),
                  load_threshold = 0.6)
#> 
#> === MDS Selection Summary ===
#>   Components retained (eigenvalue > 1): 2
#>   Total variance explained: 78.9%
#>   MDS variables selected  : 2
#>   MDS: MBC, AP
mds$mds_vars
#> [1] "MBC" "AP"

8 Step 5: Compute SQI Using All Six Methods

8.1 Linear Scoring

Equal-weight additive index (Doran and Parkin 1994):

res_lin <- sqi_linear(scored, cfg,
                      group_cols = c("LandUse", "Depth"),
                      mds_vars   = mds$mds_vars)
print(res_lin)
#> # A tibble: 10 × 4
#>    LandUse        Depth              Raw_score SQI_linear
#>    <chr>          <chr>                  <dbl>      <dbl>
#>  1 Natural_Forest Surface_0_15cm        0.611      1     
#>  2 Cropland       Surface_0_15cm        0.549      0.880 
#>  3 Agroforestry   Surface_0_15cm        0.531      0.847 
#>  4 Natural_Forest Subsurface_15_30cm    0.452      0.695 
#>  5 Cropland       Subsurface_15_30cm    0.451      0.692 
#>  6 Grassland      Surface_0_15cm        0.398      0.592 
#>  7 Agroforestry   Subsurface_15_30cm    0.390      0.576 
#>  8 Grassland      Subsurface_15_30cm    0.303      0.409 
#>  9 Degraded_Land  Surface_0_15cm        0.0982     0.0148
#> 10 Degraded_Land  Subsurface_15_30cm    0.0905     0

8.2 Regression-Based

Variables weighted by stepwise regression coefficients (Masto et al. 2008):

res_reg <- sqi_regression(scored, cfg,
                           dep_var    = "OC",
                           group_cols = c("LandUse", "Depth"),
                           mds_vars   = mds$mds_vars)
print(res_reg)
#> # A tibble: 10 × 3
#>    LandUse        Depth              SQI_regression
#>    <chr>          <chr>                       <dbl>
#>  1 Natural_Forest Surface_0_15cm              1    
#>  2 Agroforestry   Surface_0_15cm              0.716
#>  3 Natural_Forest Subsurface_15_30cm          0.660
#>  4 Grassland      Surface_0_15cm              0.651
#>  5 Agroforestry   Subsurface_15_30cm          0.412
#>  6 Grassland      Subsurface_15_30cm          0.377
#>  7 Cropland       Surface_0_15cm              0.350
#>  8 Cropland       Subsurface_15_30cm          0.206
#>  9 Degraded_Land  Surface_0_15cm              0.05 
#> 10 Degraded_Land  Subsurface_15_30cm          0

8.3 PCA-Based

Variables weighted by variance explained (Andrews, Karlen, and Cambardella 2004; Bastida et al. 2008):

res_pca <- sqi_pca(scored, cfg,
                   group_cols = c("LandUse", "Depth"),
                   mds        = mds)
print(res_pca)
#> # A tibble: 10 × 3
#>    LandUse        Depth              SQI_pca
#>    <chr>          <chr>                <dbl>
#>  1 Natural_Forest Surface_0_15cm      1     
#>  2 Agroforestry   Surface_0_15cm      0.745 
#>  3 Natural_Forest Subsurface_15_30cm  0.668 
#>  4 Grassland      Surface_0_15cm      0.638 
#>  5 Cropland       Surface_0_15cm      0.465 
#>  6 Agroforestry   Subsurface_15_30cm  0.448 
#>  7 Grassland      Subsurface_15_30cm  0.384 
#>  8 Cropland       Subsurface_15_30cm  0.311 
#>  9 Degraded_Land  Surface_0_15cm      0.0424
#> 10 Degraded_Land  Subsurface_15_30cm  0

8.4 Fuzzy Logic

Arithmetic or geometric mean of fuzzy membership scores:

res_fuz <- sqi_fuzzy(scored, cfg,
                     group_cols = c("LandUse", "Depth"),
                     mds_vars   = mds$mds_vars,
                     operator   = "mean")
print(res_fuz)
#> # A tibble: 10 × 3
#>    LandUse        Depth              SQI_fuzzy
#>    <chr>          <chr>                  <dbl>
#>  1 Natural_Forest Surface_0_15cm        1     
#>  2 Cropland       Surface_0_15cm        0.880 
#>  3 Agroforestry   Surface_0_15cm        0.847 
#>  4 Natural_Forest Subsurface_15_30cm    0.695 
#>  5 Cropland       Subsurface_15_30cm    0.692 
#>  6 Grassland      Surface_0_15cm        0.592 
#>  7 Agroforestry   Subsurface_15_30cm    0.576 
#>  8 Grassland      Subsurface_15_30cm    0.409 
#>  9 Degraded_Land  Surface_0_15cm        0.0148
#> 10 Degraded_Land  Subsurface_15_30cm    0

8.5 Entropy Weighting

Objective weights derived from Shannon entropy (Shannon 1948):

res_ent <- sqi_entropy(scored, cfg,
                       group_cols = c("LandUse", "Depth"),
                       mds_vars   = mds$mds_vars)
print(res_ent)
#> # A tibble: 10 × 3
#>    LandUse        Depth              SQI_entropy
#>    <chr>          <chr>                    <dbl>
#>  1 Natural_Forest Surface_0_15cm          1     
#>  2 Agroforestry   Surface_0_15cm          0.81  
#>  3 Cropland       Surface_0_15cm          0.729 
#>  4 Natural_Forest Subsurface_15_30cm      0.685 
#>  5 Grassland      Surface_0_15cm          0.608 
#>  6 Cropland       Subsurface_15_30cm      0.554 
#>  7 Agroforestry   Subsurface_15_30cm      0.529 
#>  8 Grassland      Subsurface_15_30cm      0.400 
#>  9 Degraded_Land  Surface_0_15cm          0.0248
#> 10 Degraded_Land  Subsurface_15_30cm      0
cat("\nEntropy weights:\n")
#> 
#> Entropy weights:
print(attr(res_ent, "entropy_weights"))
#>    MBC     AP 
#> 0.6011 0.3989

8.6 TOPSIS

Multi-criteria ranking by ideal solution distance (Hwang and Yoon 1981):

res_top <- sqi_topsis(scored, cfg,
                      group_cols = c("LandUse", "Depth"),
                      mds_vars   = mds$mds_vars)
print(res_top)
#> # A tibble: 10 × 3
#>    LandUse        Depth              SQI_topsis
#>    <chr>          <chr>                   <dbl>
#>  1 Natural_Forest Surface_0_15cm         0.661 
#>  2 Agroforestry   Surface_0_15cm         0.613 
#>  3 Cropland       Surface_0_15cm         0.590 
#>  4 Natural_Forest Subsurface_15_30cm     0.512 
#>  5 Cropland       Subsurface_15_30cm     0.497 
#>  6 Grassland      Surface_0_15cm         0.453 
#>  7 Agroforestry   Subsurface_15_30cm     0.423 
#>  8 Grassland      Subsurface_15_30cm     0.314 
#>  9 Degraded_Land  Surface_0_15cm         0.0374
#> 10 Degraded_Land  Subsurface_15_30cm     0.0253

8.7 Compare All Methods

comparison <- sqi_compare(scored, cfg,
                           group_cols = c("LandUse", "Depth"),
                           dep_var    = "OC",
                           mds        = mds)
print(comparison)
#> # A tibble: 10 × 10
#>    LandUse        Depth      SQI_linear SQI_pca SQI_fuzzy SQI_entropy SQI_topsis
#>    <chr>          <chr>           <dbl>   <dbl>     <dbl>       <dbl>      <dbl>
#>  1 Natural_Forest Surface_0…     1       1         1           1          0.661 
#>  2 Agroforestry   Surface_0…     0.847   0.745     0.847       0.81       0.613 
#>  3 Natural_Forest Subsurfac…     0.695   0.668     0.695       0.685      0.512 
#>  4 Cropland       Surface_0…     0.880   0.465     0.880       0.729      0.590 
#>  5 Grassland      Surface_0…     0.592   0.638     0.592       0.608      0.453 
#>  6 Agroforestry   Subsurfac…     0.576   0.448     0.576       0.529      0.423 
#>  7 Cropland       Subsurfac…     0.692   0.311     0.692       0.554      0.497 
#>  8 Grassland      Subsurfac…     0.409   0.384     0.409       0.400      0.314 
#>  9 Degraded_Land  Surface_0…     0.0148  0.0424    0.0148      0.0248     0.0374
#> 10 Degraded_Land  Subsurfac…     0       0         0           0          0.0253
#> # ℹ 3 more variables: SQI_regression <dbl>, Mean_SQI <dbl>, Rank <int>

9 Step 6: Visualisation

9.1 Score Heatmap

plot_scores(scored, cfg,
            group_cols = c("LandUse", "Depth"),
            group_by   = "LandUse",
            facet_by   = "Depth")

9.2 SQI Bar Chart

plot_sqi(res_lin,
         sqi_col   = "SQI_linear",
         group_col = "LandUse",
         fill_col  = "Depth")

9.3 Radar Profile

# Requires the 'fmsb' package: install.packages("fmsb")
plot_radar(scored, cfg,
           group_col  = "LandUse",
           group_cols = c("LandUse", "Depth"))

9.4 PCA Biplot

plot_pca_biplot(mds, scored, group_col = "LandUse")


10 Step 7: Statistical Analysis

10.1 ANOVA

# Compute per-observation index
scored$SQI_obs <- rowMeans(scored[, mds$mds_vars], na.rm = TRUE)

aov_res <- sqi_anova(scored,
                     sqi_col   = "SQI_obs",
                     group_col = "LandUse")
cat("ANOVA significant:", aov_res$significant, "\n")
#> ANOVA significant: TRUE
print(aov_res$compact_letters)
#>            Group Letter
#> 1   Agroforestry      a
#> 2       Cropland      a
#> 3  Degraded_Land      b
#> 4      Grassland      c
#> 5 Natural_Forest      a

10.2 Sensitivity Analysis

sa <- sqi_sensitivity(scored, cfg,
                       group_cols = c("LandUse", "Depth"),
                       method     = "linear",
                       mds_vars   = mds$mds_vars)
print(sa)
#>   variable mean_change sd_change relative_importance
#> 1      MBC      0.1830    0.1569              1.0000
#> 2       AP      0.1354    0.1046              0.7399

10.3 Sensitivity Tornado Chart

plot_sensitivity(sa)


References

Andrews, S. S., D. L. Karlen, and C. A. Cambardella. 2004. “The Soil Management Assessment Framework: A Quantitative Soil Quality Evaluation Method.” Soil Science Society of America Journal 68 (6): 1945–62. https://doi.org/10.2136/sssaj2004.1945.
Bastida, F., A. Zsolnay, T. Hernández, and C. García. 2008. “Past, Present and Future of Soil Quality Indices: A Biological Perspective.” Geoderma 147 (3–4): 159–71. https://doi.org/10.1016/j.geoderma.2008.08.007.
Doran, J. W., and T. B. Parkin. 1994. “Defining and Assessing Soil Quality.” In Defining Soil Quality for a Sustainable Environment, edited by J. W. Doran, D. C. Coleman, D. F. Bezdicek, and B. A. Stewart, 1–21. SSSA Special Publication 35. Madison, WI: Soil Science Society of America. https://doi.org/10.2136/sssaspecpub35.c1.
Hwang, C. L., and K. Yoon. 1981. Multiple Attribute Decision Making: Methods and Applications. Berlin: Springer. https://doi.org/10.1007/978-3-642-48318-9.
Masto, R. E., P. K. Chhonkar, D. Singh, and A. K. Patra. 2008. “Alternative Soil Quality Indices for Evaluating the Effect of Intensive Cropping, Fertilisation and Manuring for 31 Years in the Semi-Arid Soils of India.” Environmental Monitoring and Assessment 136: 419–35. https://doi.org/10.1007/s10661-007-9697-z.
Shannon, C. E. 1948. “A Mathematical Theory of Communication.” Bell System Technical Journal 27 (3): 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x.