Archaeological palimpsests — deposits in which material from multiple occupation phases is superimposed and partially intermixed — represent one of the most persistent analytical challenges in field archaeology. The palimpsestr R package implements the Stratigraphic Entanglement Field (SEF) framework, a probabilistic method for decomposing palimpsests into latent depositional phases by jointly modelling four evidence domains: spatial proximity, vertical distribution, chronological overlap, and cultural similarity. The package provides a diagonal-covariance Gaussian mixture model fitted via Expectation-Maximisation, augmented with taphonomic weighting and stratigraphic entanglement penalties. Three interpretable diagnostics — the Stratigraphic Entanglement Index (SEI), Excavation Stratigraphic Energy (ESE), and Palimpsest Dissolution Index (PDI) — allow practitioners to assess deposit coherence, detect intrusive finds, and evaluate overall phase separability. This article describes the methodology and demonstrates a complete analytical workflow on a realistic multi-period Roman villa dataset.
Archaeological palimpsests arise when successive episodes of human occupation leave material traces within the same sedimentary matrix, producing deposits in which artefacts, ecofacts, and features from chronologically distinct phases co-occur in close spatial and vertical proximity. The phenomenon is ubiquitous: from deeply stratified Near Eastern tells to shallow open-air Palaeolithic sites where millennia of activity are compressed into a few centimetres of deposit. Disentangling the depositional history of such contexts is essential for any behavioural, technological, or settlement-pattern inference, yet the analytical toolkit available to excavators has remained surprisingly limited.
The Harris Matrix (Harris, 1989) provides a rigorous formalism for recording observed stratigraphic relationships, but its relationships are deterministic and binary — a unit either overlies another or it does not. It cannot express the gradual, probabilistic mixing that characterises many palimpsests. Conversely, spatial clustering methods such as k-means (Kintigh and Ammerman, 1982) operate on coordinate data alone, ignoring vertical stratigraphy, chronological evidence from radiocarbon or ceramic seriation, and cultural-typological information. Neither approach offers a unified probabilistic framework, and neither produces quantitative diagnostics for deposit integrity or mixing intensity.
palimpsestr addresses these limitations by integrating four evidence domains — horizontal coordinates, vertical elevation, chronological range, and cultural class — into a single model that produces soft (probabilistic) phase assignments together with three diagnostic statistics. The Stratigraphic Entanglement Index (SEI) quantifies pairwise depositional coherence; Excavation Stratigraphic Energy (ESE) captures local disruption; and the Palimpsest Dissolution Index (PDI) summarises global phase separability. All three are interpretable by field practitioners and can be exported to GIS for spatial interrogation.
The SEI quantifies how strongly two finds \(i\) and \(j\) are “entangled” — that is, how much evidence supports their co-occurrence within the same depositional event. It is defined as:
\[SEI_{ij} = \frac{w_s}{d_{xy}(i,j) + \epsilon} + \frac{w_z}{\max(|z_i - z_j|, z_{floor})} + w_t \cdot O_t(i,j) + w_c \cdot \mathbb{1}[c_i = c_j]\]
where:
Each SEI component is normalised to [0, 1] by dividing by its within-dataset maximum, making weights directly interpretable as relative importance. The SEI matrix is row-normalised and used both as a regularisation penalty during model fitting and as the basis for the ESE diagnostic.
Phase decomposition proceeds via a Gaussian mixture model fitted by Expectation-Maximisation (EM). The feature matrix for each find is constructed by concatenating: scaled horizontal coordinates \((x, y)\), vertical coordinate \((z)\), temporal midpoint and temporal span of the date range, and a one-hot encoding of the cultural class variable. This multi-domain feature representation ensures that the clustering draws on all available evidence simultaneously.
The algorithm is initialised with K-means, and the resulting hard assignments are converted to softmax probabilities. The EM procedure then iterates between an E-step (computing posterior phase membership probabilities given current parameters) and an M-step (updating mixture weights, means, and diagonal covariance matrices given current memberships). Two modifications distinguish SEF from a standard Gaussian mixture:
Convergence is assessed by monitoring the log-likelihood; the algorithm terminates when the relative change falls below a tolerance threshold or a maximum number of iterations is reached.
Three statistics summarise different aspects of the decomposition:
Excavation Stratigraphic Energy (ESE): for each find, ESE measures its dissimilarity from its spatial neighbours in terms of phase assignment. A high ESE value indicates that a find is surrounded by items assigned to different phases — a signature of depositional disruption, bioturbation, or post-depositional displacement.
Palimpsest Dissolution Index (PDI): a global statistic defined as \(PDI = 1 - \bar{H} / \log(K)\), where \(\bar{H}\) is the mean Shannon entropy of the phase-probability vectors across all finds and \(K\) is the number of phases. PDI ranges from 0 (complete mixing; every find is equally likely to belong to any phase) to 1 (perfect separation; every find is assigned to exactly one phase with certainty). Values above 0.7 generally indicate well-resolved phases.
Entropy: the Shannon entropy of each find’s phase-probability vector, \(H_i = -\sum_k p_{ik} \log p_{ik}\). Low entropy means confident assignment; high entropy signals ambiguity, typically at phase boundaries or within heavily mixed zones.
Intrusion score: a composite metric combining high entropy, high ESE, and low SEI connectivity. Finds exceeding a threshold on this composite are flagged as candidate intrusions — items that are spatially or stratigraphically anomalous relative to their assigned phase.
We demonstrate the complete analytical workflow using
villa_romana, a realistic multi-period Roman villa dataset
included in the package. It simulates 300 finds from a villa with 4
occupation phases across 18 stratigraphic units, incorporating realistic
post-depositional disturbances.
library(palimpsestr)
data(villa_romana)
cat("Finds:", nrow(villa_romana), "\n")
#> Finds: 300
cat("Stratigraphic units:", length(unique(villa_romana$context)), "\n")
#> Stratigraphic units: 18
cat("Material classes:", length(unique(villa_romana$class)), "\n")
#> Material classes: 13
cat("Date range:", min(villa_romana$date_min), "to", max(villa_romana$date_max), "\n")
#> Date range: -269 to 628The four phases represent:
| Phase | Period | Chronology |
|---|---|---|
| 1 | Republican | 2nd–1st c. BCE |
| 2 | Early Imperial | 1st–2nd c. CE |
| 3 | Late Imperial | 3rd–4th c. CE |
| 4 | Late Antique | 5th–6th c. CE |
The dataset includes realistic disturbances: bioturbation (8% vertical displacement), construction cuts (5% stratigraphic intrusions), and residual pottery (3% old material in younger contexts).
cat("Material classes:\n")
#> Material classes:
sort(table(villa_romana$class), decreasing = TRUE)
#>
#> ceramic_coarse bone ceramic_amphorae metal_iron
#> 85 52 28 25
#> ceramic_sigillata glass lithic ceramic_campana
#> 21 19 19 16
#> ceramic_sigillata_d metal_bronze coin marble
#> 14 8 6 4
#> tile
#> 3str(villa_romana)
#> 'data.frame': 300 obs. of 10 variables:
#> $ id : chr "VR_0001" "VR_0002" "VR_0003" "VR_0004" ...
#> $ x : num 45.9 42.8 39.4 38.7 47 ...
#> $ y : num 49.2 38 54.3 53 37.3 ...
#> $ z : num 3.56 3.69 3.64 3.81 4.22 ...
#> $ context : chr "US_103" "US_102" "US_104" "US_103" ...
#> $ date_min : num -172 -172 -173 -153 -242 -198 -168 -212 -165 -156 ...
#> $ date_max : num -100 -67 -104 -37 -145 -102 -110 -126 -125 -95 ...
#> $ class : chr "bone" "ceramic_coarse" "ceramic_coarse" "bone" ...
#> $ taf_score : num 0.063 0.299 0.299 0.103 0.079 0.338 0.271 0.087 0.419 0.048 ...
#> $ true_phase: int 1 1 1 1 1 1 1 1 1 1 ...When the number of phases is unknown, compare_k() fits
models across a range of \(K\) values
and reports BIC, ICL, and PDI for each. The optimal \(K\) minimises BIC while maximising PDI.
ck <- compare_k(
villa_romana,
k_values = 2:7,
tafonomy = "taf_score",
context = "context",
seed = 42
)
print(ck)
#> k pdi mean_entropy mean_local_sei mean_energy loglik bic
#> 1 2 1.0000000 0.000000e+00 207.028 10.30843 8996.519 -17576.66
#> 2 3 1.0000000 0.000000e+00 207.028 10.30843 12637.253 -24647.09
#> 3 4 0.9701078 4.143940e-02 207.028 10.30843 13982.562 -27126.67
#> 4 5 1.0000000 0.000000e+00 207.028 10.30843 15580.391 -30111.29
#> 5 6 0.9999949 9.076127e-06 207.028 10.30843 14334.154 -27407.77
#> 6 7 0.9999953 9.066056e-06 207.028 10.30843 14598.013 -27724.45
#> icl tot_withinss pseudo_bic
#> 1 -17576.66 1297.8397 644.7383
#> 2 -24647.09 1101.2130 698.1198
#> 3 -27151.53 984.1808 767.0805
#> 4 -30111.29 852.3565 826.6071
#> 5 -27407.78 769.4746 898.5861
#> 6 -27724.46 734.8830 987.4553Based on the selection diagnostics and archaeological knowledge of the site (4 occupation periods), we fit with K = 4 and multiple random initialisations.
fit <- fit_sef(
villa_romana,
k = 4,
tafonomy = "taf_score",
context = "context",
n_init = 10,
seed = 42
)
print(fit)
#> <sef_fit>
#> Observations: 300
#> Estimated phases: 4
#> Mean entropy: 0.000
#> Mean local SEI: 207.028
#> Mean energy: 10.308
#> LogLik: 14635.455
#> BIC: -28432.453
#> ICL: -28432.453
#> PDI: 1.000
#> Converged: yes
#> Initialisations: 10summary(fit)
#> $n
#> [1] 300
#>
#> $k
#> [1] 4
#>
#> $pdi
#> [1] 1
#>
#> $mean_entropy
#> [1] 0
#>
#> $mean_energy
#> [1] 10.30843
#>
#> $loglik
#> [1] 14635.45
#>
#> $bic
#> [1] -28432.45
#>
#> $phase_count
#>
#> 1 2 3 4
#> 113 43 72 72The phase-field map reveals the spatial structure of the decomposition. The model assigns each find to one of four chronologically coherent phases despite the presence of post-depositional disturbance.
Entropy mapping is particularly useful for identifying the spatial extent of mixing. Zones of elevated entropy correspond to areas where multiple depositional phases are interleaved and cannot be cleanly separated.
The energy map highlights local disruptions. Clusters of high-ESE finds may indicate bioturbation channels, pit cuts, or other post-depositional disturbances.
The vertical profile confirms the expected stratigraphic ordering: earlier phases occupy deeper strata, later phases are shallower.
The detect_intrusions() function computes a composite
intrusion probability for each find, combining entropy, ESE, and SEI
connectivity into a single score.
di <- detect_intrusions(fit)
suspects <- di[di$intrusion_prob > 0.5, ]
cat("Suspected intrusions:", nrow(suspects), "out of", nrow(villa_romana), "finds",
sprintf("(%.1f%%)\n", 100 * nrow(suspects) / nrow(villa_romana)))
#> Suspected intrusions: 19 out of 300 finds (6.3%)if (nrow(suspects) > 0) {
# Merge with source data and phase assignments
phase_tab <- as_phase_table(fit)
susp_data <- merge(suspects, villa_romana[, c("id", "context", "class")], by = "id")
susp_data <- merge(susp_data, phase_tab[, c("id", "dominant_phase")], by = "id")
susp_data <- susp_data[order(susp_data$intrusion_prob, decreasing = TRUE), ]
susp_show <- susp_data[, c("id", "context", "class", "dominant_phase", "intrusion_prob")]
names(susp_show) <- c("Find", "US", "Class", "Phase", "Intrusion Prob.")
knitr::kable(head(susp_show, 15), row.names = FALSE, digits = 3,
caption = "Top suspected intrusions ranked by intrusion probability.")
}| Find | US | Class | Phase | Intrusion Prob. |
|---|---|---|---|---|
| VR_0295 | US_403 | tile | 1 | 0.624 |
| VR_0280 | US_403 | lithic | 2 | 0.611 |
| VR_0266 | US_404 | lithic | 2 | 0.592 |
| VR_0184 | US_304 | ceramic_amphorae | 3 | 0.584 |
| VR_0292 | US_404 | lithic | 2 | 0.580 |
| VR_0260 | US_402 | lithic | 2 | 0.559 |
| VR_0255 | US_402 | lithic | 2 | 0.544 |
| VR_0031 | US_101 | metal_iron | 1 | 0.533 |
| VR_0297 | US_401 | bone | 4 | 0.529 |
| VR_0005 | US_102 | bone | 4 | 0.522 |
| VR_0015 | US_101 | bone | 4 | 0.521 |
| VR_0258 | US_403 | metal_iron | 1 | 0.519 |
| VR_0287 | US_403 | ceramic_sigillata_d | 4 | 0.517 |
| VR_0261 | US_402 | lithic | 2 | 0.512 |
| VR_0220 | US_304 | ceramic_sigillata_d | 4 | 0.511 |
Flagged finds warrant closer examination: they may represent genuinely intrusive material (e.g., items displaced by animal burrowing or construction cuts), or they may sit at genuine phase boundaries where assignment is inherently ambiguous.
The us_summary_table() function aggregates diagnostics
per stratigraphic unit, reporting dominant phase, purity, mean entropy,
energy, and intrusion count.
us_tab <- us_summary_table(fit)
knitr::kable(us_tab, row.names = FALSE, digits = 3,
caption = "Per-US diagnostics: dominant phase, purity, mean entropy, mean ESE, and number of intrusions.")| context | n_finds | dominant_phase | purity | mean_entropy | mean_energy | mean_local_sei | n_intrusions |
|---|---|---|---|---|---|---|---|
| US_101 | 20 | 2 | 0.400 | 0 | 10.436 | 182.756 | 2 |
| US_102 | 16 | 2 | 0.688 | 0 | 10.490 | 165.276 | 1 |
| US_103 | 13 | 4 | 0.462 | 0 | 10.124 | 181.638 | 0 |
| US_104 | 14 | 2 | 0.429 | 0 | 10.322 | 170.486 | 1 |
| US_201 | 18 | 3 | 0.444 | 0 | 9.687 | 220.272 | 0 |
| US_202 | 12 | 3 | 0.583 | 0 | 9.621 | 218.802 | 0 |
| US_203 | 21 | 3 | 0.571 | 0 | 9.961 | 220.554 | 0 |
| US_204 | 18 | 3 | 0.556 | 0 | 9.433 | 226.365 | 0 |
| US_205 | 19 | 3 | 0.579 | 0 | 9.352 | 219.409 | 0 |
| US_301 | 19 | 1 | 0.421 | 0 | 10.649 | 226.346 | 0 |
| US_302 | 19 | 1 | 0.421 | 0 | 10.541 | 232.135 | 0 |
| US_303 | 20 | 1 | 0.600 | 0 | 10.394 | 233.207 | 0 |
| US_304 | 19 | 4 | 0.421 | 0 | 10.414 | 221.838 | 2 |
| US_305 | 18 | 1 | 0.667 | 0 | 10.404 | 227.608 | 0 |
| US_401 | 14 | 1 | 0.714 | 0 | 11.026 | 198.514 | 2 |
| US_402 | 19 | 1 | 0.474 | 0 | 10.869 | 186.837 | 3 |
| US_403 | 11 | 1 | 0.545 | 0 | 11.456 | 161.082 | 5 |
| US_404 | 10 | 1 | 0.500 | 0 | 11.084 | 180.713 | 3 |
Units with purity > 90% indicate consistent phase assignment within the context. Lower purity and higher mean entropy correspond to disturbed contexts where bioturbation or construction cuts have introduced material from adjacent phases.
Shows how often finds from phase \(i\) sit directly above finds from phase \(j\), revealing spatial adjacency between phases:
tm <- phase_transition_matrix(fit)
print(tm)
#> phase1 phase2 phase3 phase4
#> phase1 37 14 30 32
#> phase2 16 14 6 7
#> phase3 28 5 22 16
#> phase4 31 10 14 17High off-diagonal values indicate zones of contact between phases. This is expected at the boundary between successive occupation periods.
When true phase assignments are known (e.g., from simulation or independently dated contexts), palimpsestr provides two validation metrics:
ari <- adjusted_rand_index(fit, villa_romana$true_phase)
cat("Adjusted Rand Index:", round(ari, 3), "\n")
#> Adjusted Rand Index: 0.133confusion_matrix(fit, villa_romana$true_phase)
#> true
#> estimated 3 1 2 4
#> 1 44 15 24 30
#> 2 1 31 3 8
#> 3 16 2 52 2
#> 4 29 17 16 10The SEI combines four evidence domains (spatial, vertical, temporal, cultural) with configurable weights. To assess how weight choices affect the results, we compare three configurations:
configs <- list(
equal = c(ws = 1, wz = 1, wt = 1, wc = 1),
spatial = c(ws = 2, wz = 1, wt = 0.5, wc = 0.5),
temporal = c(ws = 0.5, wz = 0.5, wt = 2, wc = 1)
)
sens <- do.call(rbind, lapply(names(configs), function(nm) {
w <- configs[[nm]]
f <- fit_sef(villa_romana, k = 4, weights = w, seed = 42,
tafonomy = "taf_score", context = "context")
data.frame(
config = nm,
pdi = pdi(f),
ari = adjusted_rand_index(f, villa_romana$true_phase),
mean_entropy = mean(f$entropy, na.rm = TRUE),
stringsAsFactors = FALSE
)
}))
knitr::kable(sens, digits = 4,
caption = "Sensitivity of PDI, ARI, and mean entropy to weight configuration.")| config | pdi | ari | mean_entropy |
|---|---|---|---|
| equal | 0.9701 | 0.2279 | 0.0414 |
| spatial | 0.9701 | 0.2279 | 0.0414 |
| temporal | 0.9701 | 0.2279 | 0.0414 |
The results show that the model is reasonably robust to weight
variation, though emphasising the most informative domain for a given
site can improve performance. The optimize_weights()
function provides a data-driven approach to weight selection via
cross-validated log-likelihood.
Uncertainty on key statistics can be quantified via bootstrap resampling:
bs <- bootstrap_sef(fit, n_boot = 50, true_labels = villa_romana$true_phase, verbose = FALSE)
knitr::kable(bs, digits = 4,
caption = "Bootstrap confidence intervals (50 replicates) for key model statistics.")| statistic | estimate | lower | upper | se |
|---|---|---|---|---|
| pdi | 1.0000 | 0.9937 | 1.0000 | 0.0017 |
| mean_entropy | 0.0000 | 0.0000 | 0.0088 | 0.0023 |
| mean_energy | 10.3084 | 9.4868 | 10.5417 | 0.3201 |
| loglik | 14635.4547 | 13261.3458 | 14842.8517 | 422.1786 |
| ari | 0.1330 | 0.0424 | 0.1892 | 0.0392 |
palimpsestr can incorporate stratigraphic information from a Harris
Matrix as a penalty during model fitting. The
harris_from_contexts() function auto-generates a penalty
matrix from the mean depth of finds in each context. Alternatively,
read_harris() imports an external Harris Matrix from a CSV
edge list.
H <- harris_from_contexts(villa_romana, z_col = "z", context_col = "context")
cat("Harris penalty matrix:", nrow(H), "x", ncol(H), "\n")
#> Harris penalty matrix: 300 x 300fit_h <- fit_sef(villa_romana, k = 4, harris = H,
tafonomy = "taf_score", context = "context",
n_init = 5, seed = 42)
cat("PDI without Harris:", round(pdi(fit), 4), "\n")
#> PDI without Harris: 1
cat("PDI with Harris: ", round(pdi(fit_h), 4), "\n")
#> PDI with Harris: 1
cat("ARI without Harris:", round(adjusted_rand_index(fit, villa_romana$true_phase), 4), "\n")
#> ARI without Harris: 0.133
cat("ARI with Harris: ", round(adjusted_rand_index(fit_h, villa_romana$true_phase), 4), "\n")
#> ARI with Harris: 0.133validate_phases_harris(fit)
#> upper lower upper_phase lower_phase consistent
#> 1 US_104 US_102 2 2 TRUE
#> 2 US_102 US_101 2 2 TRUE
#> 3 US_101 US_103 2 4 TRUE
#> 4 US_103 US_205 4 3 FALSE
#> 5 US_205 US_201 3 3 TRUE
#> 6 US_201 US_203 3 3 TRUE
#> 7 US_203 US_202 3 3 TRUE
#> 8 US_202 US_204 3 3 TRUE
#> 9 US_204 US_303 3 1 FALSE
#> 10 US_303 US_301 1 1 TRUE
#> 11 US_301 US_304 1 4 TRUE
#> 12 US_304 US_302 4 1 FALSE
#> 13 US_302 US_305 1 1 TRUE
#> 14 US_305 US_401 1 1 TRUE
#> 15 US_401 US_402 1 1 TRUE
#> 16 US_402 US_404 1 1 TRUE
#> 17 US_404 US_403 1 1 TRUEThe consistent column flags whether the dominant phase
ordering respects stratigraphic depth. FALSE indicates an
inversion that may warrant investigation.
The report_sef() function generates a plain-language
summary of the analysis, suitable for inclusion in excavation reports.
It is available in English and Italian.
report_sef(fit, lang = "en")
#> # SEF Analysis Report
#>
#> ## Overview
#>
#> The Stratigraphic Entanglement Field model with **K = 4 phases** was fitted to a dataset of **300 finds** distributed across **18 stratigraphic contexts**.
#>
#> The Palimpsest Dissolution Index (PDI) is **1.000**, indicating **excellent phase separation**. Mean entropy is 0.0000 and mean ESE is 10.308.
#>
#> Model diagnostics: LogLik = 14635.5, BIC = -28432.5.
#>
#> ## Phase Composition
#>
#> ### Phase 1 (113 finds, 37.7%)
#>
#> - **Dating**: average 280 CE (range: -236 to 628)
#> - **Material classes**: ceramic_coarse (85), metal_iron (25), tile (3)
#> - **Mean entropy**: 0.0000; **Mean energy**: 10.475
#> - **Contexts**: US_101, US_102, US_103, US_104, US_201, US_202, ... (18 total)
#> - **Assessment**: Entropy is very low, suggesting this phase is internally coherent and well-defined.
#>
#> ### Phase 2 (43 finds, 14.3%)
#>
#> - **Dating**: average 5 CE (range: -223 to 586)
#> - **Material classes**: ceramic_campana (16), lithic (19), metal_bronze (8)
#> - **Mean entropy**: 0.0000; **Mean energy**: 10.578
#> - **Contexts**: US_101, US_102, US_103, US_104, US_202, US_205, ... (11 total)
#> - **Assessment**: Entropy is very low, suggesting this phase is internally coherent and well-defined.
#>
#> ### Phase 3 (72 finds, 24.0%)
#>
#> - **Dating**: average 163 CE (range: -193 to 609)
#> - **Material classes**: ceramic_amphorae (28), ceramic_sigillata (21), glass (19), marble (4)
#> - **Mean entropy**: 0.0000; **Mean energy**: 9.843
#> - **Contexts**: US_104, US_201, US_202, US_203, US_204, US_205, ... (13 total)
#> - **Assessment**: Entropy is very low, suggesting this phase is internally coherent and well-defined.
#>
#> ### Phase 4 (72 finds, 24.0%)
#>
#> - **Dating**: average 202 CE (range: -269 to 618)
#> - **Material classes**: bone (52), ceramic_sigillata_d (14), coin (6)
#> - **Mean entropy**: 0.0000; **Mean energy**: 10.352
#> - **Contexts**: US_101, US_102, US_103, US_104, US_201, US_202, ... (18 total)
#> - **Assessment**: Entropy is very low, suggesting this phase is internally coherent and well-defined.
#>
#> ## Intrusion Detection
#>
#> **19 finds** (6.3%) have an intrusion probability above 0.5, flagging them as potentially displaced or redeposited.
#>
#> Top suspects:
#>
#> | ID | Context | Intrusion Prob. |
#> |---|---|---|
#> | VR_0295 | US_403 | 0.624 |
#> | VR_0280 | US_403 | 0.611 |
#> | VR_0266 | US_404 | 0.592 |
#> | VR_0184 | US_304 | 0.584 |
#> | VR_0292 | US_404 | 0.580 |
#> | VR_0260 | US_402 | 0.559 |
#> | VR_0255 | US_402 | 0.544 |
#> | VR_0031 | US_101 | 0.533 |
#> | VR_0297 | US_401 | 0.529 |
#> | VR_0005 | US_102 | 0.522 |
#>
#> ## Stratigraphic Unit Purity
#>
#> Of **18 stratigraphic units**: **0 pure** (>90%), **1 mixed** (70-90%), **17 palimpsest** (<70%).
#>
#> Units classified as palimpsest require special attention, as material from multiple depositional episodes co-occurs within them.
#>
#> ## Recommendations
#>
#> The deposit shows good phase separation. The model can be used with confidence for phase attribution. Focus review on flagged intrusions and boundary zones.
#>
#> ---
#> *Report generated by palimpsestr 0.10.0*report_sef(fit, lang = "it")
#> # Rapporto Analisi SEF
#>
#> ## Panoramica
#>
#> Il modello Stratigraphic Entanglement Field con **K = 4 fasi** e' stato applicato a un dataset di **300 reperti** distribuiti in **18 contesti stratigrafici**.
#>
#> Il Palimpsest Dissolution Index (PDI) e' **1.000**, indicando una separazione **eccellente** delle fasi. Entropia media: 0.0000. Energia media: 10.308.
#>
#> Diagnostiche: LogLik = 14635.5, BIC = -28432.5.
#>
#> ## Composizione delle Fasi
#>
#> ### Fase 1 (113 reperti, 37.7%)
#>
#> - **Datazione media**: 280 d.C.
#> - **Classi materiali**: ceramic_coarse (85), metal_iron (25), tile (3)
#> - **Entropia media**: 0.0000; **Energia media**: 10.475
#> - **US**: US_101, US_102, US_103, US_104, US_201, US_202, ... (18 totali)
#>
#> ### Fase 2 (43 reperti, 14.3%)
#>
#> - **Datazione media**: 5 d.C.
#> - **Classi materiali**: ceramic_campana (16), lithic (19), metal_bronze (8)
#> - **Entropia media**: 0.0000; **Energia media**: 10.578
#> - **US**: US_101, US_102, US_103, US_104, US_202, US_205, ... (11 totali)
#>
#> ### Fase 3 (72 reperti, 24.0%)
#>
#> - **Datazione media**: 163 d.C.
#> - **Classi materiali**: ceramic_amphorae (28), ceramic_sigillata (21), glass (19), marble (4)
#> - **Entropia media**: 0.0000; **Energia media**: 9.843
#> - **US**: US_104, US_201, US_202, US_203, US_204, US_205, ... (13 totali)
#>
#> ### Fase 4 (72 reperti, 24.0%)
#>
#> - **Datazione media**: 202 d.C.
#> - **Classi materiali**: bone (52), ceramic_sigillata_d (14), coin (6)
#> - **Entropia media**: 0.0000; **Energia media**: 10.352
#> - **US**: US_101, US_102, US_103, US_104, US_201, US_202, ... (18 totali)
#>
#> ## Intrusioni
#>
#> **19 reperti** (6.3%) presentano probabilita' di intrusione superiore a 0.5.
#>
#> ## Purezza delle US
#>
#> Su **18 US**: **0 pure** (>90%), **1 miste** (70-90%), **17 palinsesto** (<70%).
#>
#> ---
#> *Rapporto generato da palimpsestr 0.10.0*All results can be exported to CSV for integration with external databases and GIS:
This creates 4 files:
villa_romana_phases.csv — phase assignments and
diagnostics per findvilla_romana_intrusions.csv — intrusion
probabilitiesvilla_romana_us_summary.csv — per-US aggregated
statisticsvilla_romana_model_summary.csv — model-level metrics
(n, k, PDI, BIC, etc.)For integration with GIS workflows, palimpsestr can export results as
sf spatial objects:
When plotly is available, any gg_* plot
can be converted to an interactive visualization using
as_plotly():
| Aspect | Harris Matrix | k-means | palimpsestr (SEF) |
|---|---|---|---|
| Evidence domains | Stratigraphy only | Spatial only | Spatial + vertical + temporal + cultural |
| Assignment type | Deterministic | Hard | Probabilistic (soft) |
| Mixing detection | No | No | Yes (entropy, ESE) |
| Intrusion detection | Manual | No | Automatic |
| Scalability | Manual | Good | Good |
| Interpretability | High | Low | High (PDI, reports) |
The SEF framework does not replace the Harris Matrix. Rather, it extends it. The Harris Matrix remains indispensable for documenting observed physical relationships between depositional units. What palimpsestr adds is a quantitative, probabilistic layer of analysis that operates on find-level data within those units — precisely the scale at which palimpsest formation is most problematic.
The EM uses diagonal Gaussians, assuming conditional independence between features within each phase. This is a deliberate parsimony choice: for \(p\) features and \(k\) phases, diagonal covariance requires \(k(2p+1)-1\) parameters versus \(k(p(p+3)/2+1)-1\) for full covariance. With typical archaeological datasets (n = 50–500, p = 7–10), full covariance risks overfitting.
Each SEI component is normalised to [0, 1] by dividing by its within-dataset maximum. This makes weights interpretable but means absolute SEI values are not comparable across datasets. Cross-site comparisons should use derived statistics (PDI, mean entropy) which are scale-invariant.
The PDI is a descriptive measure, not a formal test statistic. The interpretive guideline (PDI > 0.7 = well-resolved) is empirically derived from simulation studies and should not be treated as a formal decision boundary. Bootstrap confidence intervals provide uncertainty quantification.
The intrusion probability score is a heuristic combining rescaled entropy, ESE, and inverse local SEI. The 0.5 threshold is an exploratory guideline. Users should examine flagged observations individually using domain knowledge.
Phase numbers are arbitrary in mixture models (label switching). Use
reorder_phases() to ensure phase 1 = deepest (oldest)
stratum. The bootstrap procedure applies phase reordering to each
replicate.
The cv_sef() and optimize_weights()
functions standardise test data using the training set’s center and
scale, ensuring consistent feature scales across folds.
The SEI and ESE matrices are \(O(n^2)\) in memory. For datasets exceeding
~2000 finds, use sei_sparse() or the max_dist
parameter in sei_matrix() to limit computation to spatial
neighbours.
The palimpsestr package offers a unified probabilistic framework for decomposing archaeological palimpsests, addressing a long-standing gap between qualitative stratigraphic reasoning and quantitative spatial analysis. By jointly modelling spatial proximity, vertical distribution, chronological overlap, and cultural similarity, the Stratigraphic Entanglement Field approach produces soft phase assignments that acknowledge — rather than suppress — the inherent uncertainty of mixed deposits. The three diagnostic statistics (SEI, ESE, PDI) provide interpretable, actionable summaries at the find, deposit, and assemblage levels respectively.
The package is available on GitHub at https://github.com/enzococca/palimpsestr.