Title: | Spatial Data Science Complementary Features |
Version: | 0.8.0 |
Description: | Wrapping and supplementing commonly used functions in the R ecosystem related to spatial data science, while serving as a basis for other packages maintained by Wenbo Lv. |
License: | GPL-3 |
Encoding: | UTF-8 |
URL: | https://stscl.github.io/sdsfun/, https://github.com/stscl/sdsfun |
BugReports: | https://github.com/stscl/sdsfun/issues |
RoxygenNote: | 7.3.2 |
Depends: | R (≥ 4.1.0) |
LinkingTo: | Rcpp, RcppArmadillo |
Imports: | dplyr, geosphere, magrittr, pander, purrr, sf, spdep, stats, tibble, utils |
Suggests: | ggplot2, Rcpp, RcppArmadillo, terra, testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
NeedsCompilation: | yes |
Packaged: | 2025-05-12 14:21:14 UTC; 31809 |
Author: | Wenbo Lv |
Maintainer: | Wenbo Lv <lyu.geosocial@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-05-12 14:50:02 UTC |
Pipe operator
Description
See magrittr::%>%
for details.
Usage
lhs %>% rhs
Value
NULL
(this is the magrittr pipe operator)
check for NA values in a tibble
Description
check for NA values in a tibble
Usage
check_tbl_na(tbl)
Arguments
tbl |
A |
Value
A logical value.
Examples
demotbl = tibble::tibble(x = c(1,2,3,NA,1),
y = c(NA,NA,1:3),
z = 1:5)
demotbl
check_tbl_na(demotbl)
(partial) correlation test
Description
(partial) correlation test
Usage
cor_test(x, y, z = NULL, level = 0.05)
Arguments
x |
A numeric vector representing the first variable. |
y |
A numeric vector representing the second variable. |
z |
An optional numeric vector or matrix of control variables. If provided, partial correlation is computed. |
level |
(optional) Significance level. Default is 0.05. |
Value
A numeric vector
Examples
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun'))
cor_test(gzma$PS_Score,gzma$EL_Score)
discretization
Description
discretization
Usage
discretize_vector(
x,
n,
method = "natural",
breakpoint = NULL,
sampleprob = 0.15,
thr = 0.4,
seed = 123456789
)
Arguments
x |
A continuous numeric vector. |
n |
(optional) The number of discretized classes. |
method |
(optional) The method of discretization, default is |
breakpoint |
(optional) Break points for manually splitting data. When
|
sampleprob |
(optional) When the data size exceeds |
thr |
(optional) Threshold for controlling iteration, applicable only to
headtails breaks. Default is |
seed |
(optional) Random seed number, default is |
Value
A discretized integer vector
Examples
xvar = c(22361, 9573, 4836, 5309, 10384, 4359, 11016, 4414, 3327, 3408,
17816, 6909, 6936, 7990, 3758, 3569, 21965, 3605, 2181, 1892,
2459, 2934, 6399, 8578, 8537, 4840, 12132, 3734, 4372, 9073,
7508, 5203)
discretize_vector(xvar, n = 5, method = 'natural')
transforming a category tibble into the corresponding dummy variable tibble
Description
transforming a category tibble into the corresponding dummy variable tibble
Usage
dummy_tbl(tbl)
Arguments
tbl |
A |
Value
A tibble
Examples
a = tibble::tibble(x = 1:3,y = 4:6)
dummy_tbl(a)
transforming a categorical variable into dummy variables
Description
transforming a categorical variable into dummy variables
Usage
dummy_vec(x)
Arguments
x |
An integer vector or can be converted into an integer vector. |
Value
A matrix.
Examples
dummy_vec(c(1,1,3,2,4,6))
get variable names in a formula and data
Description
get variable names in a formula and data
Usage
formula_varname(formula, data)
Arguments
formula |
A formula. |
data |
A |
Value
A list.
yname
Independent variable name
xname
Dependent variable names
Examples
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun'))
formula_varname(PS_Score ~ EL_Score + OH_Score, gzma)
formula_varname(PS_Score ~ ., gzma)
spatial fuzzy overlay
Description
spatial fuzzy overlay
Usage
fuzzyoverlay(formula, data, method = "and")
Arguments
formula |
A formula of spatial fuzzy overlay. |
data |
A data.frame or tibble of discretized data. |
method |
(optional) Overlay methods. When |
Value
A numeric vector.
Note
Independent variables in the data
provided to fuzzyoverlay()
must be discretized
variables, and dependent variable are continuous variable.
Examples
set.seed(42)
sim = tibble::tibble(y = stats::runif(7,0,10),
x1 = c(1,rep(2,3),rep(3,3)),
x2 = c(rep(1,2),rep(2,2),rep(3,3)))
fo1 = fuzzyoverlay(y~x1+x2,data = sim, method = 'and')
fo1
fo2 = fuzzyoverlay(y~x1+x2,data = sim, method = 'or')
fo2
generate subsets of a set
Description
generate subsets of a set
Usage
generate_subsets(set, empty = TRUE, self = TRUE)
Arguments
set |
A vector. |
empty |
(optional) When |
self |
(optional) When |
Value
A list.
Examples
generate_subsets(letters[1:3])
generate_subsets(letters[1:3],empty = FALSE)
generate_subsets(letters[1:3],self = FALSE)
generate_subsets(letters[1:3],empty = FALSE,self = FALSE)
only geodetector q-value
Description
only geodetector q-value
Usage
geodetector_q(y, hs)
Arguments
y |
Dependent variable |
hs |
Independent variable |
Value
A numeric value
Examples
geodetector_q(y = 1:7, hs = c('x',rep('y',3),rep('z',3)))
hierarchical clustering with spatial soft constraints
Description
hierarchical clustering with spatial soft constraints
Usage
hclustgeo_disc(
data,
n,
alpha = 0.5,
D1 = NULL,
hclustm = "ward.D2",
scale = TRUE,
wt = NULL,
...
)
Arguments
data |
An |
n |
The number of hierarchical clustering classes, which can be a numeric value or vector. |
alpha |
(optional) A positive value between |
D1 |
(optional) A |
hclustm |
(optional) The agglomeration method to be used, default is |
scale |
(optional) Whether to scaled the dissimilarities matrix, default is |
wt |
(optional) Vector with the weights of the observations. By default, |
... |
(optional) Other arguments passed to |
Value
The grouped membership: a vector
if n
is a scalar, a matrix
(columns correspond to elements
of n
) if not.
Note
This is a C++
enhanced implementation of the hclustgeo
function in ClustGeo
package.
Examples
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun'))
gzma$group = hclustgeo_disc(gzma,5,alpha = 0.75)
plot(gzma["group"])
construct inverse distance weight
Description
Function for constructing inverse distance weight.
Usage
inverse_distance_swm(sfj, power = 1, bandwidth = NULL)
Arguments
sfj |
Vector object that can be converted to |
power |
(optional) Default is 1. Set to 2 for gravity weights. |
bandwidth |
(optional) When the distance is bigger than bandwidth, the
corresponding part of the weight matrix is set to 0. Default is |
Details
The inverse distance weight formula is
w_{ij} = 1 / d_{ij}^\alpha
Value
A inverse distance weight matrices with class of matrix
.
Examples
library(sf)
pts = read_sf(system.file('extdata/pts.gpkg',package = 'sdsfun'))
wt = inverse_distance_swm(pts)
wt[1:5,1:5]
determine optimal spatial data discretization for individual variables
Description
Function for determining optimal spatial data discretization for individual variables based on locally estimated scatterplot smoothing (LOESS) model.
Usage
loess_optnum(qvec, discnumvec, increase_rate = 0.05)
Arguments
qvec |
A numeric vector of q statistics. |
discnumvec |
A numeric vector of break numbers corresponding to |
increase_rate |
(optional) The critical increase rate of the number of discretization.
Default is |
Value
A two element numeric vector.
discnum
optimal number of spatial data discretization
increase_rate
the critical increase rate of the number of discretization
Note
When increase_rate
is not satisfied by the calculation, the discrete number corresponding
to the highest q statistic
is selected as a return.
Note that sdsfun
sorts discnumvec
from smallest to largest and keeps qvec
in
one-to-one correspondence with discnumvec
.
Examples
qv = c(0.26045642,0.64120405,0.43938704,0.95165535,0.46347836,
0.25385338,0.78778726,0.95938330,0.83247885,0.09285196)
loess_optnum(qv,3:12)
global spatial autocorrelation test
Description
global spatial autocorrelation test
Usage
moran_test(sfj, wt = NULL, alternative = "greater", symmetrize = FALSE)
Arguments
sfj |
An |
wt |
(optional) Spatial weight matrix. Must be a |
alternative |
(optional) Specification of alternative hypothesis as |
symmetrize |
(optional) Whether or not to symmetrize the asymmetrical spatial weight matrix
wt by: 1/2 * (wt + wt'). Default is |
Value
A list utilizing a result
tibble to store the following information for each variable:
MoranI
observed value of the Moran coefficient
EI
expected value of Moran's I
VarI
variance of Moran's I (under normality)
ZI
standardized Moran coefficient
PI
p-value of the test statistic
Examples
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun'))
moran_test(gzma)
normalization
Description
normalization
Usage
normalize_vector(x, to_left = 0, to_right = 1)
Arguments
x |
A continuous numeric vector. |
to_left |
(optional) Specified minimum. Default is |
to_right |
(optional) Specified maximum. Default is |
Value
A continuous vector which has normalized.
Examples
normalize_vector(c(-5,1,5,0.01,0.99))
remove variable linear trend based on covariate
Description
remove variable linear trend based on covariate
Usage
rm_lineartrend(formula, data, method = c("cpp", "r"))
Arguments
formula |
A formula. |
data |
The observation data. |
method |
(optional) The method for using, which can be chosen as either |
Value
A numeric vector.
Examples
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun'))
rm_lineartrend(PS_Score ~ ., gzma)
rm_lineartrend(PS_Score ~ ., gzma, method = "r")
extract locations
Description
Extract locations of sf objects.
Usage
sf_coordinates(sfj)
Arguments
sfj |
An |
Value
A matrix.
Examples
pts = sf::read_sf(system.file('extdata/pts.gpkg',package = 'sdsfun'))
sf_coordinates(pts)
generates distance matrix
Description
Generates distance matrix for sf object
Usage
sf_distance_matrix(sfj)
Arguments
sfj |
An |
Value
A matrix.
Examples
pts = sf::read_sf(system.file('extdata/pts.gpkg',package = 'sdsfun'))
pts_distm = sf_distance_matrix(pts)
pts_distm[1:5,1:5]
sf object geometry column name
Description
Get the geometry column name of an sf object
Usage
sf_geometry_name(sfj)
Arguments
sfj |
An |
Value
A character.
Examples
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun'))
sf_geometry_name(gzma)
sf object geometry type
Description
Get the geometry type of an sf object
Usage
sf_geometry_type(sfj)
Arguments
sfj |
An |
Value
A lowercase character vector
Examples
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun'))
sf_geometry_type(gzma)
generates cgcs2000 Gauss-Kruger projection epsg coding character
Description
Generates a Gauss-Kruger projection epsg coding character corresponding to an sfj
object
under the CGCS2000 spatial reference.
Usage
sf_gk_proj_cgcs2000(sfj, degree = 6L)
Arguments
sfj |
An |
degree |
(optional) |
Value
A character.
Examples
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun')) |>
sf::st_transform(4490)
sf_gk_proj_cgcs2000(gzma,3)
sf_gk_proj_cgcs2000(gzma,6)
generates wgs84 utm projection epsg coding character
Description
Generates a utm projection epsg coding character corresponding to an sfj
object
under the WGS84 spatial reference.
Usage
sf_utm_proj_wgs84(sfj)
Arguments
sfj |
An |
Value
A character.
Examples
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun'))
sf_utm_proj_wgs84(gzma)
generates voronoi diagram
Description
Generates Voronoi diagram (Thiessen polygons) for sf object
Usage
sf_voronoi_diagram(sfj)
Arguments
sfj |
An |
Value
An sf
object of polygon geometry type or can be converted to this by sf::st_as_sf()
.
Note
Only sf objects of (multi-)point type are supported to generate voronoi diagram and the returned result includes only the geometry column.
Examples
pts = sf::read_sf(system.file('extdata/pts.gpkg',package = 'sdsfun'))
pts_v = sf_voronoi_diagram(pts)
library(ggplot2)
ggplot() +
geom_sf(data = pts_v, color = 'red',
fill = 'transparent') +
geom_sf(data = pts, color = 'blue', size = 1.25) +
theme_void()
only spade power of spatial determinant
Description
only spade power of spatial determinant
Usage
spade_psd(y, hs, wt)
Arguments
y |
Dependent variable |
hs |
Independent variable |
wt |
Spatial weight matrix |
Value
A numeric value
Examples
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun'))
wt1 = inverse_distance_swm(gzma)
spade_psd(y = gzma$PS_Score,
hs = discretize_vector(gzma$PS_Score,5),
wt = wt1)
constructs spatial weight matrices based on contiguity
Description
Constructs spatial weight matrices based on contiguity via spdep
package.
Usage
spdep_contiguity_swm(
sfj,
queen = TRUE,
k = NULL,
order = 1L,
cumulate = TRUE,
style = "W",
zero.policy = TRUE
)
Arguments
sfj |
An |
queen |
(optional) if |
k |
(optional) The number of nearest neighbours. Ignore this parameter when not using distance based neighbours to construct spatial weight matrices. |
order |
(optional) The order of the adjacency object. Default is |
cumulate |
(optional) Whether to accumulate adjacency objects. Default is |
style |
(optional) |
zero.policy |
(optional) if |
Value
A matrix
Note
When k
is set to a positive value, using K-Nearest Neighbor Weights.
Examples
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun'))
wt1 = spdep_contiguity_swm(gzma, k = 6, style = 'B')
wt2 = spdep_contiguity_swm(gzma, queen = TRUE, style = 'B')
wt3 = spdep_contiguity_swm(gzma, queen = FALSE, order = 2, style = 'B')
constructs spatial weight matrices based on distance
Description
Constructs spatial weight matrices based on distance via spdep
package.
Usage
spdep_distance_swm(
sfj,
kernel = NULL,
k = NULL,
bandwidth = NULL,
power = 1,
style = "W",
zero.policy = TRUE
)
Arguments
sfj |
An |
kernel |
(optional) The kernel function, can be one of |
k |
(optional) The number of nearest neighbours. Default is |
bandwidth |
(optional) The bandwidth, default is |
power |
(optional) Default is |
style |
(optional) |
zero.policy |
(optional) if |
Details
five different kernel weight functions:
uniform:
K_{(z)} = 1/2
,for\lvert z \rvert < 1
triangular
K_{(z)} = 1 - \lvert z \rvert
,for\lvert z \rvert < 1
quadratic (epanechnikov)
K_{(z)} = \frac{3}{4} \left( 1 - z^2 \right)
,for\lvert z \rvert < 1
quartic
K_{(z)} = \frac{15}{16} {\left( 1 - z^2 \right)}^2
,for\lvert z \rvert < 1
gaussian
K_{(z)} = \frac{1}{\sqrt{2 \pi}} e^{- \frac{z^2}{2}}
For the equation above, z = d_{ij} / h_i
where h_i
is the bandwidth
Value
A matrix
Note
When kernel
is setting, using distance weight based on kernel function, Otherwise
the inverse distance weight will be used.
Examples
pts = sf::read_sf(system.file('extdata/pts.gpkg',package = 'sdsfun'))
wt1 = spdep_distance_swm(pts, style = 'B')
wt2 = spdep_distance_swm(pts, kernel = 'gaussian')
wt3 = spdep_distance_swm(pts, k = 3, kernel = 'gaussian')
wt4 = spdep_distance_swm(pts, k = 3, kernel = 'gaussian', bandwidth = 10000)
spatial linear models selection
Description
spatial linear models selection
Usage
spdep_lmtest(formula, data, listw = NULL)
Arguments
formula |
A formula for linear regression model. |
data |
An |
listw |
(optional) A listw. See |
Value
A list
Examples
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun'))
spdep_lmtest(PS_Score ~ ., gzma)
construct neighbours list
Description
construct neighbours list
Usage
spdep_nb(sfj, queen = TRUE, k = NULL, order = 1L, cumulate = TRUE)
Arguments
sfj |
An |
queen |
(optional) if |
k |
(optional) The number of nearest neighbours. Ignore this parameter when not using distance based neighbours. |
order |
(optional) The order of the adjacency object. Default is |
cumulate |
(optional) Whether to accumulate adjacency objects. Default is |
Value
A neighbours list with class nb
Note
When k
is set to a positive value, using K-Nearest Neighbor
Examples
pts = sf::read_sf(system.file('extdata/pts.gpkg',package = 'sdsfun'))
nb1 = spdep_nb(pts, k = 6)
nb2 = spdep_nb(pts, queen = TRUE)
nb3 = spdep_nb(pts, queen = FALSE, order = 2)
spatial c(k)luster analysis by tree edge removal
Description
SKATER forms clusters by spatially partitioning data that has similar values for features of interest.
Usage
spdep_skater(sfj, k = 6, nb = NULL, ini = 5, ...)
Arguments
sfj |
An |
k |
(optional) The number of clusters. Default is |
nb |
(optional) A neighbours list with class nb. If the input |
ini |
(optional) The initial node in the minimal spanning tree. Defaul is |
... |
(optional) Other parameters passed to spdep::skater(). |
Value
A numeric vector of clusters.
Examples
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun'))
gzma_c = spdep_skater(gzma,8)
gzma$group = gzma_c
plot(gzma["group"])
spatial variance
Description
spatial variance
Usage
spvar(x, wt, method = c("cpp", "r"))
Arguments
x |
A numerical vector . |
wt |
The spatial weight matrix. |
method |
(optional) The method for calculating spatial variance, which can be chosen as
either |
Details
The spatial variance formula is
\Gamma = \frac{\sum_i \sum_{j \neq i} \omega_{ij}\frac{(y_i-y_j)^2}{2}}{\sum_i \sum_{j \neq i} \omega_{ij}}
Value
A numerical value.
Examples
gzma = sf::read_sf(system.file('extdata/gzma.gpkg',package = 'sdsfun'))
wt1 = inverse_distance_swm(gzma)
spvar(gzma$PS_Score,wt1)
spatial stratified heterogeneity test
Description
spatial stratified heterogeneity test
Usage
ssh_test(y, hs)
Arguments
y |
Variable Y, continuous numeric vector. |
hs |
Spatial stratification or classification of each explanatory variable.
|
Value
A tibble
Examples
ssh_test(y = 1:7, hs = c('x',rep('y',3),rep('z',3)))
standardization
Description
To calculate the Z-score using variance normalization, the formula is as follows:
Z = \frac{(x - mean(x))}{sd(x)}
Usage
standardize_vector(x)
Arguments
x |
A numeric vector |
Value
A standardized numeric vector
Examples
standardize_vector(1:10)
convert discrete variables in a tibble to integers
Description
convert discrete variables in a tibble to integers
Usage
tbl_all2int(tbl)
Arguments
tbl |
A |
Value
A converted tibble
,data.frame
or sf
object.
Examples
demotbl = tibble::tibble(x = c(1,2,3,3,1),
y = letters[1:5],
z = c(1L,1L,2L,2L,3L),
m = factor(letters[1:5],levels = letters[5:1]))
tbl_all2int(demotbl)
convert xyz tbl to matrix
Description
convert xyz tbl to matrix
Usage
tbl_xyz2mat(tbl, x = 1, y = 2, z = 3)
Arguments
tbl |
A |
x |
(optional) The x-axis coordinates column number, default is |
y |
(optional) The y-axis coordinates column number, default is |
z |
(optional) The z (attribute) coordinates column number, default is |
Value
A list
.
- z_attrs_matrix
A matrix with attribute information.
- x_coords_matrix
A matrix with the x-axis coordinates.
- y_coords_matrix
A matrix with the y-axis coordinates.
Examples
set.seed(42)
lon = rep(1:3,each = 3)
lat = rep(1:3,times = 3)
zattr = rnorm(9, mean = 10, sd = 1)
demodf = data.frame(x = lon, y = lat, z = zattr)
demodf
tbl_xyz2mat(demodf)