cdlsim-simulations

library(cdlsim)

The USDA Cropland Data Layer

One of the primary tools used in the United States to monitor agricultural land is the National Agricultural Statistics Service (NASS) Cropland Data Layer (CDL), an annual map created by the United States Department of Agriculture (USDA) from satellite images. While the CDL is highly accurate, the data product is still prone to misclassify certain land use types. These challenges can affect the reliability of any metrics calculated using the data. To address this, we have developed cdlsim, which simulates land use changes to quantify the sensitivity of land use metrics to variability in the input data. We demonstrate the general uses of our package through two examples: a simple example using a mock landscape and a real world example in the Bear Lake watershed in Utah.

Package Functions

The two major purposes of our package are to download and format the transition matrices and to simulate the CDL data provided as the input. To accomplish the former, there are three main functions: download_cdl_mat_files, get_mat_data, and get_trans_mat. The first function downloads and unzips folders containing the confusion matrices for all the US states using their zip file’s URL. The range of years of data to download and the file path to save the corresponding Excel files can be specified in the arguments. The get_mat_data and get_trans_mat functions then take the previously downloaded data and extract the files for the states supplied. These functions also read in the tables (formatted as Excel spreadsheets) as matrices, allowing for easier manipulation, formatting, and improving compatibility with the simulation functions.

The second major purpose of cdlsim is the simulation function simulate_raster_rcpp. Here we simulate class values at the patch level maintaining the spatial integrity and and structure of the landscape. This function takes as input a SpatRaster, a transition matrix, and the desired number of iterations. Several additional arguments can be optionally set. For example, in many cases, it is reasonable to prevent specific pixel values, such as those representing lakes or streams, from transitioning. These values can be included in the non_trans argument, ensuring that the corresponding pixels retain their original class. By default, this argument prevents the background class (0) from transitioning, as this class represents areas with no data. Additionally to accommodate larger study areas the simulation function includes a memory-efficient option that is able to handle larger raster files. The user can activate the memory efficient simulation version using the Fast argument. By default, Fast = TRUE, which is recommended for smaller study areas, such as counties size regions.

What is a Patch?

A patch is defined as a group of connected neighboring cells from the same class. For simulation purposes, adjacent or diagonally connected pixels of the same class are treated as a single patch (i.e., queen adjacency). This definition follows similarly to that of patches in the landscapemetrics ensuring compatibility.

Installation

You can install the development version of cdlsim from GitHub with:

# install.packages("devtools")
devtools::install_github("burgerhaley97/cdlsim")

The Simple Example

This is a basic example which shows how a simple landscape would behave in our simulation. We start by building a SpatRaster with three different classes each with three 90 by 90 pixel squares.

# Define raster dimensions and classes
ncol <- 90
nrow <- 90
classes <- 3

# Initialize a SpatRaster with the desired dimensions
r_small <- terra::rast(ncol = ncol, nrow = nrow, vals = NA)

# Assign values in 30x30 blocks for each class
r_small[1:30, 1:30] <- 1  
r_small[1:30, 31:60] <- 2  
r_small[1:30, 61:90] <- 3 

r_small[31:60, 1:30] <- 2  
r_small[31:60, 31:60] <- 3  
r_small[31:60, 61:90] <- 1 

r_small[61:90, 1:30] <- 3  
r_small[61:90, 31:60] <- 1  
r_small[61:90, 61:90] <- 2  
# Show original landscape before simulation
terra::plot(r_small, axes = FALSE, main = "Simple Landscape")

Next we define the transition matrix which describes the probability that each class has of transitioning to the other classes during simulation. Note that the row names represent the original landscape’s classes and the column names represent the simulated class values. For example, the first entry below describes the chance that class one patches will transition to class one during simulation.

# Define confusion matrix, background not transitioning
conf_mat <- matrix(1/3, nrow = 3, ncol = 3)

# Assign row and column names
rownames(conf_mat) <- c("1", "2", "3")
colnames(conf_mat) <- c("1", "2", "3")

# Print the transition matrix 
print(conf_mat)
#>           1         2         3
#> 1 0.3333333 0.3333333 0.3333333
#> 2 0.3333333 0.3333333 0.3333333
#> 3 0.3333333 0.3333333 0.3333333

Finally we simulate the original landscape five times and plot the results.

# Set a random seed for reproducibility 
set.seed(1891)

# Simulate the example landscape 5 times
simple_sim <- simulate_raster_patch(original_raster = r_small,
                                       transition_matrix = conf_mat,
                                       iterations = 5)
# Plot the results
terra::plot(simple_sim, axes = FALSE)

The Bear Lake Watershed

Here we use the R package landscapemetrics to implement popular FRAGSTATS style metrics to demonstrate how cdlsim can be used to quantify metric sensitivity to random perturbations introduced through simulation. To illustrate, we use the Cropland Data Layer (CDL) for the Bear Lake watershed in 2008, along with the associated confusion matrix for that year.

# Load the raster
bl_raster <- terra::rast(system.file("extdata/cropped_raster.tif", package="cdlsim"))
terra::plot(bl_raster, axes = FALSE, main = "Bear Lake Watershed")

The CDL data for the Bear Lake watershed is reclassified before simulation.

# Define the values that represent our classes of interest
non_ag <- c(61:65, 81:83, 87:88, 92, 111:112, 121:124, 131, 141:143, 152, 176, 181, 190, 195)
alfalfa <- c(36:37)
major_ag <- c(1, 2, 5, 12, 13, 22:24, 26, 225:226, 228, 234, 236, 238:241, 254)
all_numbers <- 1:256
ag <- setdiff(all_numbers, c(non_ag, alfalfa, major_ag))

# reclassify into background = 0, major-ag = 1, alfalfa = 2, non-ag = 3, ag = 4
cdl_reclass <- terra::app(bl_raster, fun = function(x) {
  ifelse(is.na(x), 0,
         ifelse(x %in% major_ag, 1,
                ifelse(x %in% alfalfa, 2,
                       ifelse(x %in% non_ag, 3, 4)
                )
         )
  )
})
terra::plot(cdl_reclass, axes = FALSE, main = "Reclassified Landscape")

# Get the USDA NASS confusion matrix for Utah in 2008
trans_mat_08 <- readRDS(system.file("extdata/trans_mat_08.rds", 
                                    package = "cdlsim"))
print(trans_mat_08)
#>   0          1          2          3          4
#> 0 1 0.00000000 0.00000000 0.00000000 0.00000000
#> 1 0 0.93011768 0.01303862 0.04198174 0.01486196
#> 2 0 0.03093508 0.83156145 0.08164135 0.05586211
#> 3 0 0.03422092 0.03652534 0.91287971 0.01637403
#> 4 0 0.04406338 0.03388917 0.02435344 0.89769401
# Simulate the Bear Lake watershed 5 times
set.seed(1891)

bl_sim <- simulate_raster_patch(original_raster = cdl_reclass,
                      transition_matrix = trans_mat_08,
                      non_trans = c(0), iterations = 5)
# look at the simulations
terra::plot(bl_sim)

# Calculate Simpson's Diversity on 100 simulations
bl_sidi <- landscapemetrics::lsm_l_sidi(bl_sim) |>
  dplyr::rename(sim_value = value)


# Calculate Shannon's and Simpson on the 1000 iter sims and og models
og_sidi <- landscapemetrics::lsm_l_sidi(cdl_reclass) |>
  dplyr::rename(og_value = value)


# combine the og and sim results into one dataframe
div_combo_bl <- dplyr::full_join(bl_sidi, og_sidi, dplyr::join_by(metric)) |>
  dplyr::select(metric, sim_value, og_value)

Compare the simulated Simpson’s Diversity Index values to the original landscape’s values.

# Print the tibble of the simulated versus original landscape values
print(div_combo_bl)
#> # A tibble: 5 × 3
#>   metric sim_value og_value
#>   <chr>      <dbl>    <dbl>
#> 1 sidi       0.558    0.559
#> 2 sidi       0.557    0.559
#> 3 sidi       0.557    0.559
#> 4 sidi       0.558    0.559
#> 5 sidi       0.558    0.559

The cdlsim package provides a flexible and intuitive framework for simulating classification uncertainty in USDA Cropland Data Layer products at the patch level. Through both a simple example and a real-world application in the Bear Lake watershed, we demonstrate how cdlsim can be used to evaluate the sensitivity of landscape metrics to random perturbations in classification. The package supports transition matrices that are either user-defined or derived from USDA NASS published confusion matrices, offering valuable tools for researchers aiming to assess the robustness of land use analyses under data uncertainty.