---
title: "Cross-system classification: WRB 2022, SiBCS 5, USDA Soil Taxonomy"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Cross-system classification: WRB 2022, SiBCS 5, USDA Soil Taxonomy}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>"
)
library(soilKey)
```

`soilKey` ships three independent classification keys -- WRB 2022 (Module 1), SiBCS 5ª edição (Module 6), and USDA Soil Taxonomy 13th edition (Module 5). Every key consumes the same `PedonRecord`, so a profile can be classified through all three in a single pass. This vignette demonstrates the alignment on canonical fixtures and shows where the systems agree, disagree, and complement each other.

# 1. The same Ferralsol through three keys

The canonical Ferralsol fixture is a clay-rich, low-CEC, low-BS Brazilian profile.

```{r classify-three}
pr <- make_ferralsol_canonical()

w <- classify_wrb2022(pr, on_missing = "silent")
s <- classify_sibcs (pr, on_missing = "silent")
u <- classify_usda  (pr, on_missing = "silent")

data.frame(
  System  = c("WRB 2022", "SiBCS 5", "USDA"),
  Class   = c(w$rsg_or_order, s$rsg_or_order, u$rsg_or_order),
  Full    = c(w$name, s$name, u$name)
)
```

The three systems converge on the same conceptual unit:

* **WRB**  : Ferralsol with the canonical Ch 6 qualifiers.
* **SiBCS**: Latossolo Vermelho (red Latossolo).
* **USDA** : Oxisol.

This three-way alignment is the textbook correspondence: WRB Ferralsol ↔ SiBCS Latossolo ↔ USDA Oxisol.

# 2. Cross-system table on the canonical fixtures

A subset of fixtures across the three systems:

```{r cross-table}
fxs <- list(
  Ferralsol  = make_ferralsol_canonical(),
  Acrisol    = make_acrisol_canonical(),
  Lixisol    = make_lixisol_canonical(),
  Luvisol    = make_luvisol_canonical(),
  Nitisol    = make_nitisol_canonical(),
  Vertisol   = make_vertisol_canonical(),
  Andosol    = make_andosol_canonical(),
  Histosol   = make_histosol_canonical(),
  Podzol     = make_podzol_canonical(),
  Cambisol   = make_cambisol_canonical(),
  Gleysol    = make_gleysol_canonical(),
  Plinthosol = make_plinthosol_canonical()
)

tab <- do.call(rbind, lapply(names(fxs), function(nm) {
  pr <- fxs[[nm]]
  data.frame(
    Fixture = nm,
    WRB     = classify_wrb2022(pr, on_missing = "silent")$rsg_or_order,
    SiBCS   = classify_sibcs (pr, on_missing = "silent")$rsg_or_order,
    USDA    = classify_usda  (pr, on_missing = "silent")$rsg_or_order
  )
}))
knitr::kable(tab)
```

The table reproduces the canonical correspondences:

| WRB        | SiBCS        | USDA        |
|------------|--------------|-------------|
| Ferralsol  | Latossolo    | Oxisol      |
| Acrisol    | Argissolo    | Ultisol     |
| Lixisol    | Argissolo    | Alfisol     |
| Luvisol    | Argissolo    | Alfisol     |
| Nitisol    | Nitossolo    | Alfisol/Ultisol |
| Vertisol   | Vertissolo   | Vertisol    |
| Andosol    | Cambissolo / specific | Andisol |
| Histosol   | Organossolo  | Histosol    |
| Podzol     | Espodossolo  | Spodosol    |

# 3. Where the systems diverge

The same profile can land in different "RSGs" because each system uses a slightly different gating criterion. The most important divergences:

**Argic horizon chemistry**: SiBCS lumps Acrisol/Lixisol/Alisol/Luvisol under *Argissolos*, while WRB splits them by CEC (Lixisol/Luvisol = high CEC) AND base saturation (Acrisol/Alisol = low BS, low/high Al). The USDA equivalent split is Ultisol (low BS) vs Alfisol (high BS).

**Andic vs cambic priority**: A volcanic ash soil with weak Bw can land in WRB Andosol but in SiBCS Cambissolo if the andic criteria narrowly fail. USDA Andisol uses the same andic criterion as WRB.

**Plinthic / petric variants**: WRB Plinthosols, USDA Plinthudults / Plinthumults, SiBCS Plintossolos -- all rely on the same plinthite criterion but apply different gating order in the key.

# 4. Recovering the qualifier-level correspondence

For the same profile, each system provides additional discriminators:

```{r ferralsol-three-detail}
pr <- make_ferralsol_canonical()
w  <- classify_wrb2022(pr, on_missing = "silent")
s  <- classify_sibcs (pr, on_missing = "silent")
u  <- classify_usda  (pr, on_missing = "silent")

cat("WRB principal qualifiers:    ",
    paste(w$qualifiers$principal,     collapse = ", "), "\n")
cat("WRB supplementary qualifiers:",
    paste(w$qualifiers$supplementary, collapse = ", "), "\n")
cat("SiBCS subordem (2nd level):  ", s$rsg_or_order,    "\n")
cat("USDA suborder / great group: ", u$rsg_or_order,    "\n")
```

The WRB qualifier ladder *(Geric, Ferric, Rhodic, Chromic + Clayic, Humic, Dystric, Ochric, Rubic)* is the most expressive: it captures CEC, iron, colour, texture, organic carbon, and base saturation in one parenthesised string. SiBCS achieves the same through its 2nd-categorical-level *subordem* names (e.g. Latossolos Vermelhos, Distroférricos), which are encoded separately. USDA's information density is concentrated in the great group / subgroup level (currently scaffolded for v1.0).

# 5. Validating the SiBCS ↔ WRB alignment

`soilKey` runs the SiBCS key on the same canonical fixtures used for WRB. The fixture-level correspondence is asserted by the test suite:

```{r sibcs-mapping}
sibcs_expectations <- c(
  Ferralsol  = "Latossolos",
  Acrisol    = "Argissolos",
  Lixisol    = "Argissolos",
  Luvisol    = "Argissolos",
  Nitisol    = "Nitossolos",
  Vertisol   = "Vertissolos",
  Andosol    = "Cambissolos",   # Cambissolo Háplico Tb (Andic-leaning)
  Histosol   = "Organossolos",
  Podzol     = "Espodossolos",
  Plinthosol = "Plintossolos"
)

actual <- vapply(names(sibcs_expectations), function(nm) {
  fx <- get(paste0("make_", tolower(nm), "_canonical"))()
  classify_sibcs(fx, on_missing = "silent")$rsg_or_order
}, character(1))

data.frame(
  fixture       = names(sibcs_expectations),
  expected      = unname(sibcs_expectations),
  actual        = actual,
  match         = actual == sibcs_expectations
)
```

# 6. Use cases for cross-system classification

* **Brazilian field surveys** -- producers and extension services use SiBCS, while international literature uses WRB. The same `PedonRecord` resolved through both keys gives the bilingual name without re-entering the data.

* **Global benchmarks** -- WoSIS profiles carry WRB names; some legacy datasets use Soil Taxonomy. The cross-system table makes both corpora analysable side by side.

* **Concept stress-testing** -- when WRB and SiBCS disagree on the same profile, the cause is almost always a single threshold (CEC/clay, BS, andic). Inspecting the disagreement is a fast way to find data-entry errors or to identify ambiguous profiles that deserve a closer look.

The next vignette (`v04_vlm_extraction`) shows how the `PedonRecord` itself can be assembled from PDFs and field photos via vision-language extraction, so the cross-system pass can run on freshly-described profiles without manual data entry.