Type: | Package |
Title: | Spatial and Spatiotemporal Relative Risk |
Version: | 2.3-16 |
Date: | 2025-03-20 |
Description: | Provides functions to estimate kernel-smoothed spatial and spatio-temporal densities and relative risk functions, and perform subsequent inference. Methodological details can be found in the accompanying tutorial: Davies et al. (2018) <doi:10.1002/sim.7577>. |
Depends: | R (≥ 2.10.1), spatstat (≥ 2.3-0) |
Imports: | spatstat.utils, spatstat.geom, spatstat.explore, spatstat.random, spatstat.univar, doParallel, parallel, foreach, misc3d |
Suggests: | fftwtools (≥ 0.9.8) |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
LazyLoad: | yes |
NeedsCompilation: | no |
RoxygenNote: | 7.3.2 |
URL: | https://tilmandavies.github.io/sparr/, https://github.com/tilmandavies/sparr/ |
BugReports: | https://github.com/tilmandavies/sparr/issues/ |
Encoding: | UTF-8 |
Packaged: | 2025-03-20 01:03:03 UTC; davti75p |
Author: | Tilman M. Davies [aut, cre], Jonathan C. Marshall [aut] |
Maintainer: | Tilman M. Davies <tilman.davies@otago.ac.nz> |
Repository: | CRAN |
Date/Publication: | 2025-03-20 03:40:01 UTC |
The sparr Package: Spatial and Spatiotemporal Relative Risk
Description
Provides functions to estimate fixed and adaptive kernel-smoothed spatial relative risk surfaces via the density-ratio method and perform subsequent inference. Fixed-bandwidth spatiotemporal density and relative risk estimation is also supported.
Details
Package: | sparr |
Date: | 2025-03-20 |
Version: | 2.3-16 |
License: | GPL (>= 2) |
Kernel smoothing, and the flexibility afforded by this methodology, provides an attractive approach to estimating complex probability density functions.
The spatial relative risk function, constructed as a ratio of estimated case to control densities (Bithell, 1990; 1991; Kelsall and Diggle, 1995a,b), describes the variation in the ‘risk’ of the disease, given the underlying at-risk population. This is a technique that has been applied successfully for mainly exploratory purposes in a number of different analyses (see for example Sabel et al., 2000; Prince et al., 2001; Wheeler, 2007, Elson et al., 2021). It has also grown in popularity in very different fields that pose similarly styled research questions, such as ecology (e.g. Campos and Fedigan, 2014); physiology (Davies et al., 2013); and archaeology (e.g. Bevan, 2012; Smith et al. 2015).
This package provides functions for spatial (i.e. bivariate/planar/2D) kernel density estimation (KDE), implementing both fixed and ‘variable’ or ‘adaptive’ (Abramson, 1982) smoothing parameter options. A selection of bandwidth calculators for bivariate KDE and the relative risk function are provided, including one based on the maximal smoothing principle (Terrell, 1990), and others involving a leave-one-out cross-validation (see below). In addition, the ability to construct both Monte-Carlo and asymptotic p-value surfaces (‘tolerance’ contours of which signal statistically significant sub-regions of extremity in a risk surface - Hazelton and Davies, 2009; Davies and Hazelton, 2010) as well as some visualisation tools are provided.
Spatiotemporal estimation is also supported, largely following developments in Fernando and Hazelton (2014). This includes their fixed-bandwith kernel estimator of spatiotemporal densities, relative risk, and asymptotic tolerance contours.
Key content of sparr
can be broken up as follows:
DATASETS/DATA GENERATION
pbc
a case/control planar point pattern (ppp.object
) concerning liver disease in northern
England.
fmd
an anonymised (jittered) case/control spatiotemporal point pattern of the 2001 outbreak of veterinary foot-and-mouth disease in Cumbria (courtesy of the Animal and Plant Health Agency, UK).
burk
a spatiotemporal point pattern of Burkitt's lymphoma in Uganda; artificially simulated control data are also provided for experimentation.
Also available are a number of relevant additional spatial datasets built-in to the
spatstat
package (Baddeley and Turner, 2005; Baddeley et al., 2015) through spatstat.data
, such as
chorley
, which concerns the distribution of
laryngeal cancer in an area of Lancashire, UK.
rimpoly
a wrapper function of rpoint
to allow generated
spatial point patterns based on a pixel im
age to be returned with a
polygonal owin
.
SPATIAL
Bandwidth calculators
OS
estimation of an isotropic
smoothing parameter for fixed-bandwidth bivariate KDE, based on the
oversmoothing principle introduced by Terrell (1990).
NS
estimation of an isotropic smoothing parameter for fixed-bandwidth bivariate
KDE, based on the asymptotically optimal value for a normal density
(bivariate normal scale rule - see e.g. Wand and Jones, 1995).
LSCV.density
a least-squares cross-validated (LSCV) estimate
of an isotropic fixed bandwidth for bivariate, edge-corrected KDE (see e.g. Bowman and
Azzalini, 1997).
LIK.density
a likelihood cross-validated (LIK) estimate
of an isotropic fixed bandwidth for bivariate, edge-corrected KDE (see e.g. Silverman, 1986).
SLIK.adapt
an experimental likelihood cross-validation function
for simultaneous global/pilot bandwidth selection for adaptive density estimates.
BOOT.density
a bootstrap approach to optimisation
of an isotropic fixed bandwidth for bivariate, edge-corrected KDE (see e.g. Taylor, 1989).
LSCV.risk
Estimation of a jointly optimal,
common isotropic case-control fixed bandwidth for the kernel-smoothed risk
function based on the mean integrated squared error (MISE), a weighted MISE,
or the asymptotic MISE (see respectively Kelsall and Diggle, 1995a; Hazelton, 2008;
Davies, 2013).
Density and relative risk estimation
bivariate.density
kernel density
estimate of bivariate data; fixed or adaptive smoothing.
multiscale.density
multi-scale adaptive kernel density
estimates for multiple global bandwidths as per Davies and Baddeley
(2018).
multiscale.slice
a single adaptive kernel estimate
based on taking a slice from a multi-scale estimate.
risk
estimation of a (log) spatial relative risk function, either from data or
pre-existing bivariate density estimates; fixed (Kelsall and Diggle, 1995a); fixed with shrinkage (Hazelton, 2023); or both asymmetric (Davies and Hazelton, 2010) and symmetric (Davies et al., 2016) adaptive estimates are possible.
tolerance
calculation of asymptotic or Monte-Carlo p-value surfaces.
tol.classify
classification of individual points based on a
p-value surface from which tolerance contours are produced.
Visualisation
S3
methods of the plot
function; see
plot.bivden
for visualising a single bivariate density
estimate from bivariate.density
, plot.rrs
for
visualisation of a spatial relative risk function from
risk
, or plot.msden
for viewing animations of
multi-scale density estimates from multiscale.density
.
tol.contour
provides more flexibility for plotting and
superimposing tolerance contours upon an existing plot of spatial relative risk (i.e. given output from
tolerance
).
tol.classplot
is a utility to view the classification scheme
with respect to unique tolerance contours/significance regions as
identified by tol.classify
.
Printing and summarising
S3
methods (print.bivden
, print.rrs
,
print.msden
, summary.bivden
,
summary.rrs
, and summary.msden
) are available for
the bivariate density, spatial relative risk, and multi-scale adaptive density objects.
SPATIOTEMPORAL
Bandwidth calculators
OS.spattemp
estimation of an isotropic
smoothing parameter for the spatial margin and another for the temporal margin
for spatiotemporal densities, based on the 2D and 1D versions, respectively, of the
oversmoothing principle introduced by Terrell (1990).
NS.spattemp
as above, based on the 2D and 1D versions of the
normal scale rule (Silverman, 1986).
LSCV.spattemp
least-squares cross-validated (LSCV) estimates
of scalar spatial and temporal bandwidths for edge-corrected spatiotemporal KDE.
LIK.spattemp
as above, based on likelihood cross-validation.
BOOT.spattemp
bootstrap bandwidth selection for the spatial and temporal margins;
for spatiotemporal, edge-corrected KDE (Taylor, 1989).
Density and relative risk estimation
spattemp.density
fixed-bandwidth kernel density estimate of spatiotemporal data.
spattemp.risk
fixed-bandwidth kernel density estimate of spatiotemporal relative risk, either with a time-static or time-varying control density (Fernando and Hazelton, 2014).
spattemp.slice
extraction function of the spatial density/relative risk at prespecified time(s).
Visualisation
S3
methods of the plot
function; see
plot.stden
for various options (including animation) for visualisation of a spatiotemporal density,
and plot.rrst
for viewing spatiotemporal relative risk surfaces (including animation and tolerance contour superimposition).
Printing and summarising objects
S3
methods (print.stden
, print.rrst
, summary.stden
, and summary.rrst
) are available for
the spatiotemporal density and spatiotemporal relative risk objects respectively.
Dependencies
The sparr
package depends upon
spatstat
. In particular, the user should familiarise
themselves with ppp
objects and
im
objects, which are used throughout. For spatiotemporal density estimation, sparr
is assisted by importing from the misc3d
package, and for the
experimental capabilities involving parallel processing, sparr
also
currently imports doParallel
,
parallel
, and foreach
.
Citation
To cite use of current versions of sparr
in publications or research projects please use:
Davies, T.M., Marshall, J.C. and Hazelton, M.L. (2018) Tutorial on kernel estimation of continuous spatial and spatiotemporal relative risk, Statistics in Medicine, 37(7), 1191-1221. <DOI:10.1002/sim.7577>
Old versions of sparr
(<= 2.1-09) can be referenced by Davies et al. (2011) (see reference list).
Author(s)
T.M. Davies
Dept. of Mathematics & Statistics, University of
Otago, Dunedin, New Zealand.
J.C. Marshall
Institute of Fundamantal Sciences, Massey University, Palmerston North, New Zealand.
Maintainer: T.M.D. tilman.davies@otago.ac.nz
References
Abramson, I. (1982), On bandwidth variation in kernel estimates — a square root law, Annals of Statistics, 10(4), 1217-1223.
Baddeley, A. and Turner, R. (2005), spatstat: an R package for analyzing spatial point patterns, Journal of Statistical Software, 12(6), 1-42.
Baddeley, A., Rubak, E. and Turner, R. (2015) Spatial Point Patterns: Methodology and Applications with R, Chapman and Hall/CRC Press, UK.
Bevan A. (2012), Spatial methods for analysing large-scale artefact inventories. Antiquity, 86, 492-506.
Bithell, J.F. (1990), An application of density estimation to geographical epidemiology, Statistics in Medicine, 9, 691-701.
Bithell, J.F. (1991), Estimation of relative risk function, Statistics in Medicine, 10, 1745-1751.
Bowman, A.W. and Azzalini, A. (1997), Applied Smoothing Techniques for Data Analysis: The Kernel Approach with S-Plus Illustrations. Oxford University Press Inc., New York. ISBN 0-19-852396-3.
Campos, F.A. and Fedigan, L.M. (2014) Spatial ecology of perceived predation risk and vigilance behavior in white-faced capuchins, Behavioral Ecology, 25, 477-486.
Davies, T.M. (2013), Jointly optimal bandwidth selection for the planar kernel-smoothed density-ratio, Spatial and Spatio-temporal Epidemiology, 5, 51-65.
Davies, T.M. and Baddeley A. (2018), Fast computation of spatially adaptive kernel estimates, Statistics and Computing, 28(4), 937-956.
Davies, T.M., Cornwall, J. and Sheard, P.W. (2013) Modelling dichotomously marked muscle fibre configurations, Statistics in Medicine, 32, 4240-4258.
Davies, T.M. and Hazelton, M.L. (2010), Adaptive kernel estimation of spatial relative risk, Statistics in Medicine, 29(23) 2423-2437.
Davies, T.M., Hazelton, M.L. and Marshall, J.C.
(2011), sparr
: Analyzing spatial relative risk using fixed and
adaptive kernel density estimation in R
, Journal of Statistical
Software 39(1), 1-14.
Davies, T.M., Jones, K. and Hazelton, M.L. (2016), Symmetric adaptive smoothing regimens for estimation of the spatial relative risk function, Computational Statistics & Data Analysis, 101, 12-28.
Elson, R., Davies, T. M., Lake, I. R., Vivancos, R., Blomquist, P. B., Charlett, A. and Dabrera, G. (2021), The spatio-temporal distribution of COVID-19 infection in England between January and June 2020, Epidemiology and Infection, 149, e73.
Fernando, W.T.P.S. and Hazelton, M.L. (2014), Generalizing the spatial relative risk function, Spatial and Spatio-temporal Epidemiology, 8, 1-10.
Hazelton, M.L. (2008), Letter to the editor: Kernel estimation of risk surfaces without the need for edge correction, Statistics in Medicine, 27, 2269-2272.
Hazelton, M.L. (2023), Shrinkage estimators of the spatial relative risk function, Submitted for publication.
Hazelton, M.L. and Davies, T.M. (2009), Inference based on kernel estimates of the relative risk function in geographical epidemiology, Biometrical Journal, 51(1), 98-109.
Kelsall, J.E. and Diggle, P.J. (1995a), Kernel estimation of relative risk, Bernoulli, 1, 3-16.
Kelsall, J.E. and Diggle, P.J. (1995b), Non-parametric estimation of spatial variation in relative risk, Statistics in Medicine, 14, 2335-2342.
Prince, M. I., Chetwynd, A., Diggle, P. J., Jarner, M., Metcalf, J. V. and James, O. F. W. (2001), The geographical distribution of primary biliary cirrhosis in a well-defined cohort, Hepatology 34, 1083-1088.
Sabel, C. E., Gatrell, A. C., Loytonen, M., Maasilta, P. and Jokelainen, M. (2000), Modelling exposure opportunitites: estimating relative risk for motor disease in Finland, Social Science & Medicine 50, 1121-1137.
Smith, B.A., Davies, T.M. and Higham, C.F.W. (2015) Spatial and social variables in the Bronze Age phase 4 cemetery of Ban Non Wat, Northeast Thailand, Journal of Archaeological Science: Reports, 4, 362-370.
Taylor, C.C. (1989) Bootstrap choice of the smoothing parameter in kernel density estimation, Biometrika, 76, 705-712.
Terrell, G.R. (1990), The maximal smoothing principle in density estimation, Journal of the American Statistical Association, 85, 470-477.
Venables, W. N. and Ripley, B. D. (2002). Modern Applied Statistics with S, Fourth Edition, Springer, New York.
Wand, M.P. and Jones, C.M., 1995. Kernel Smoothing, Chapman & Hall, London.
Wheeler, D. C. (2007), A comparison of spatial clustering and cluster detection techniques for childhood leukemia incidence in Ohio, 1996-2003, International Journal of Health Geographics, 6(13).
See Also
Useful links:
Report bugs at https://github.com/tilmandavies/sparr/issues/
Bootstrap bandwidth for a spatial kernel density estimate
Description
Isotropic fixed or global (for adaptive) bandwidth selection for a standalone 2D density based on bootstrap estimation of the MISE.
Usage
BOOT.density(pp, hlim = NULL, eta = NULL, type = c("fixed", "adaptive"),
hp = NULL, edge = c("uniform", "none"), ref.density = NULL,
resolution = 64, rmdiag = TRUE, sim.adapt = list(N = 50, B = 100,
dimz = 64, objective = FALSE), parallelise = NA, verbose = TRUE, ...)
Arguments
pp |
An object of class |
hlim |
An optional vector of length 2 giving the limits of the optimisation routine with respect to the bandwidth. If |
eta |
Fixed scalar bandwidth to use for the reference density estimate; if |
type |
A character string indicating selection type. Either |
hp |
Pilot bandwidth used for adaptive estimates in the bootstrap; see the argument of the same tag in |
edge |
Character string dictating edge correction for the bootstrapped estimates. |
ref.density |
Optional. An object of class |
resolution |
Spatial grid size; the optimisation will be based on a [ |
rmdiag |
Logical control value for removal of mirrored evaluation points as suggested by Taylor (1989) in the theoretical expression of the fixed-bandwidth MISE estimate. See ‘Details’. Ignored when |
sim.adapt |
List of control values for bootstrap simulation in the adaptive case; see ‘Details’. Ignored when |
parallelise |
Optional numeric argument to reduce computation time by invoking parallel processing, by giving the number of CPU cores to use in either evaluation (fixed) or in the actual bootstrap replicate generation (adaptive). Experimental. Test your system first using |
verbose |
Logical value indicating whether to print function progress during execution. |
... |
Optional arguments controlling scaling to be passed to |
Details
For a 2D kernel density estimate \hat{f}
defined on W \in R^2
, the mean integrated squared error (MISE) is given by E[\int_W (\hat{f}(x) - f(x))^2 dx]
, where f
is the corresponding true density. Given an observed data set X
(argument pp
) of n
observations, this function finds the bandwidth h
that minimises
E^*[\int_W (\hat{f}^*(x) - \hat{f}(x))^2 dx],
where \hat{f}(x)
is a density estimate of X
constructed with ‘reference’ bandwidth \eta
(argument eta
or ref.density
), and \hat{f}^*(x)
is a density estimate using bandwidth h
of n
observations X^*
generated from \hat{f}(x)
. The notation E^*
denotes expectation with respect to the distribution of the X^*
.
- Fixed
-
When
type = "fixed"
, the function assumes you want to select a constant bandwidth for use with the fixed-bandwith density estimator. This implementation is based on extending the remarkable results of Taylor (1989) (see also Sain et al., 1994), who demonstrates that when the Gaussian kernel is being used, we can find the optimalh
with respect to the aforementioned bootstrap-estimated MISE without any actual resampling. This implementation extends these results to the bivariate setting, and allows for edge-correction of both the reference and bootstrap densities.Taylor (1989) does not distinguish between the reference bandwidth
\eta
and the target of optimisation,h
, thus allowing the reference bandwidth to vary alongside the target in the optimisation. This is not optimal, and this function always assumes a static reference bandwidth. Hall et al. (1992) indicate that a generous amount of smoothing is to be preferred in the reference density (hence the defaulteta
set usingOS
).If
ref.density
is supplied, it must be a fixed-bandwidth density estimate as an object of classbivden
for validity of the theory. Edge-correction must be present ifedge = "uniform"
; and it must be evaluated on the same spatial domain as dictated byWindow(pp)
andresolution
. If unsupplied, the function internally computes an appropriate fixed-bandwidth density estimate usingeta
as the reference bandwidth.Finally, Taylor (1989) argues it is preferable to avoid summation at identical evaluation grid points in the expression for the optimal bandwidth, which is performed when
rmdiag = TRUE
. Settingrmdiag = FALSE
disables this correction.
- Adaptive
-
When
type = "adaptive"
, the function assumes you want to select a global bandwidth (argumenth0
inbivariate.density
) for use in 2D adaptive kernel density estimation.An expression similar to Taylor (1989) is not possible for the adaptive estimator. Thus, in the adaptive setting, the optimal bootstrap bandwidth is calculated by brute force as was performed in Davies and Baddeley (2018) by taking advantage of the multiscale estimation theory implemented in
multiscale.density
. The value that minimises an interpolating cubic spline of the estimated MISE on bandwidth is identified as the optimal global bandwidth.The user can pass either a fixed or adaptive
bivden
object toref.density
. If this is the case,hp
is ignored and the pilot bandwidth for each iteration of the bootstrap in estimation of the\hat{f}^*(x)
usesref.density$hp
(ifref.density
is adaptive) orref.density$h0
(ifref.density
is fixed). Whenref.density
is unsupplied, the function uses a fixed-bandwidth kernel estimate with bandwidtheta
as the reference density, and if additionallyhp
is unsupplied, the same valueeta
is used for the constant pilot bandwidth.Control over the bootstrap is achieved with four optional named arguments passed as a list to
sim.adapt
.N
controls the number of bootstrap iterates per bandwidth;B
controls the resolution of the sequence of bandwidths trialled (i.e. betweenhlim[1]
andhlim[2]
);dimz
specifies the resolution of the bandwidth axis in the trivariate convolution evaluated bymultiscale.density
; andobjective
specifies whether to return the set of estimated MISEs for all bandwidths (nice to plot), or merely the optimal bandwidth (see ‘Value’).The
...
are intended for any relevant optional arguments to be passed to the internal call tomultiscale.density
, such asgamma.scale
ortrim
.
Value
The optimal fixed or global (for adaptive) scalar bandwidth. If simargs$objective = TRUE
for the adaptive bootstrap, the return object is instead a [
simargs$B
x 2]
matrix, with the first column giving the trialled bandwidth and the second giving the corresponding value of the estimated bootstrap MISE.
Warning
Even with the implemented computational tricks, bootstrapping for bandwidth selection for spatial data is still computationally demanding, especially for adaptive kernel estimates. The user can reduce this time by keeping the evaluation grid at modest resolution
s, and experimenting with parallelising the internal loops via parallelise
. The ‘Examples’ section offers some rough indications of evaluation times on this author's local machine.
Author(s)
T.M. Davies
References
Davies, T.M. and Baddeley A. (2018), Fast computation of spatially adaptive kernel estimates, Statistics and Computing, 28(4), 937-956.
Hall, P., Marron, J.S. and Park, B.U. (1992) Smoothed cross-validation, Probability Theory and Related Fields, 92, 1-20.
Sain, S.R., Baggerly, K.A. and Scott, D.W. (1994) Cross-validation of multivariate densities, Journal of the American Statistical Association, 89, 807-817.
Taylor, C.C. (1989) Bootstrap choice of the smoothing parameter in kernel density estimation, Biometrika, 76, 705-712.
See Also
bivariate.density
, OS
, multiscale.density
Examples
data(pbc)
## Fixed bandwidth selection ##
BOOT.density(pbc) # ~20 secs
BOOT.density(pbc,eta=OS(pbc)/2) # halve default reference bandwidth
BOOT.density(pbc,eta=OS(pbc)*2) # double default reference bandwidth
# supplying pre-defined reference density as fixed-bandwidth 'bivden' object
pbcfix <- bivariate.density(pbc,h0=2.5,resolution=64)
system.time(hfix <- BOOT.density(pbc,ref.density=pbcfix))
hfix
## Global (for adaptive) bandwidth selection ##
# ~200 secs next line; use 'parallelise' for speedup
system.time(hada <- BOOT.density(pbc,type="adaptive")) # minimal usage for adaptive bootstrap
hada
# ~80 secs next line. Set custom h limits; increase reference bandwidth;
# set custom pilot bandwidth; return objective function; use 'parallelise' for speedup
system.time(hada <- BOOT.density(pbc,hlim=c(0.9,8),eta=3.5,type="adaptive",
hp=OS(pbc)/2,sim.adapt=list(objective=TRUE)))
hada[which.min(hada[,2]),1]
plot(hada);abline(v=hada[which.min(hada[,2]),1],col=2)
Bootstrap bandwidths for a spatiotemporal kernel density estimate
Description
Bandwidth selection for standalone spatiotemporal density/intensity based on bootstrap estimation of the MISE, providing an isotropic scalar spatial bandwidth and a scalar temporal bandwidth.
Usage
BOOT.spattemp(pp, tt = NULL, tlim = NULL, eta = NULL, nu = NULL,
sedge = c("uniform", "none"), tedge = sedge, ref.density = NULL,
sres = 64, tres = sres, start = NULL, verbose = TRUE)
Arguments
pp |
An object of class |
tt |
A numeric vector of equal length to the number of points in |
tlim |
A numeric vector of length 2 giving the limits of the temporal domain over which to smooth. If supplied, all times in |
eta |
Fixed scalar bandwidth to use for the spatial margin of the reference density estimate; if |
nu |
Fixed scalar bandwidth to use for the temporal margin of the reference density estimate; if |
sedge |
Character string dictating spatial edge correction. |
tedge |
As |
ref.density |
Optional. An object of class |
sres |
Numeric value > 0. Resolution of the [ |
tres |
Numeric value > 0. Resolution of the evaluation points in the temporal margin as defined by the |
start |
Optional positive numeric vector of length 2 giving starting values for the internal call to |
verbose |
Logical value indicating whether to print a function progress bar to the console during evaluation. |
Details
For a spatiotemporal kernel density estimate \hat{f}
defined on W x T \in R^3
, the mean integrated squared error (MISE) is given by E[\int_W \int_T (\hat{f}(x,t) - f(x,t))^2 dt dx]
, where f
is the corresponding true density. Given observed spatiotemporal locations X
(arguments pp
and tt
) of n
observations, this function finds the scalar spatial bandwidth h
and scalar temporal bandwidth \lambda
that jointly minimise
E^*[\int_W \int_T (\hat{f}^*(x,t) - \hat{f}(x,t))^2 dt dx],
where \hat{f}(x,t)
is a density estimate of X
constructed with ‘reference’ bandwidths \eta
(spatial; argument eta
) and \nu
(temporal; argument nu
); \hat{f}^*(x,t)
is a density estimate using bandwidths h
and \lambda
of n
observations X^*
generated from \hat{f}(x,t)
. The notation E^*
denotes expectation with respect to the distribution of the X^*
. The user may optionally supply ref.density
as an object of class stden
, which must be evaluated on the same spatial and temporal domains W
and T
as the data (arguments pp
, tt
, and tlim
). In this case, the reference bandwidths are extracted from this object, and eta
and nu
are ignored.
This function is based on an extension of the theory of Taylor (1989) to the spatiotemporal domain and to cope with the inclusion of edge-correction factors. No resampling is necessary due to the theoretical properties of the Gaussian kernel.
Value
A numeric vector of length 2 giving the jointly optimised spatial and temporal bandwidths (named h
and lambda
respectively).
Warning
Bootstrapping for spatiotemporal bandwidth selection for spatiotemporal data is very computationally demanding. Keeping verbose = TRUE
offers an indication of the computational burden by printing each pair of bandwidths at each iteration of the optim
isation routine. The ‘Examples’ section also offers some rough indications of evaluation times on this author's local machine.
Author(s)
T. M. Davies
References
Taylor, C.C. (1989) Bootstrap choice of the smoothing parameter in kernel density estimation, Biometrika, 76, 705-712.
See Also
LSCV.spattemp
, spattemp.density
Examples
data(burk) # Burkitt's Uganda lymphoma data
burkcas <- burk$cases
#~85 secs
hlam1 <- BOOT.spattemp(burkcas)
#~75 secs. Widen time limits, reduce ref. bw.
hlam2 <- BOOT.spattemp(burkcas,tlim=c(400,5800),eta=8,nu=450)
#~150 secs. Increase ref. bw., custom starting vals
hlam3 <- BOOT.spattemp(burkcas,eta=20,nu=800,start=c(7,400))
rbind(hlam1,hlam2,hlam3)
Cross-validation bandwidths for spatial kernel density estimates
Description
Isotropic fixed or global (for adaptive) bandwidth selection for standalone 2D density/intensity based on either unbiased least squares cross-validation (LSCV) or likelihood (LIK) cross-validation.
Usage
LIK.density(
pp,
hlim = NULL,
hseq = NULL,
resolution = 64,
edge = TRUE,
auto.optim = TRUE,
type = c("fixed", "adaptive"),
seqres = 30,
parallelise = NULL,
zero.action = 0,
verbose = TRUE,
...
)
LSCV.density(
pp,
hlim = NULL,
hseq = NULL,
resolution = 64,
edge = TRUE,
auto.optim = TRUE,
type = c("fixed", "adaptive"),
seqres = 30,
parallelise = NULL,
zero.action = 0,
verbose = TRUE,
...
)
Arguments
pp |
An object of class |
hlim |
An optional vector of length 2 giving the limits of the optimisation routine with respect to the bandwidth. If unspecified, the function attempts to choose this automatically. |
hseq |
An optional increasing sequence of bandwidth values at which to
manually evaluate the optimisation criterion. Used only in the case
|
resolution |
Spatial grid size; the optimisation will be based on a
[ |
edge |
Logical value indicating whether to edge-correct the density estimates used. |
auto.optim |
Logical value indicating whether to automate the numerical
optimisation using |
type |
A character string; |
seqres |
Optional resolution of an increasing sequence of bandwidth
values. Only used if |
parallelise |
Numeric argument to invoke parallel processing, giving
the number of CPU cores to use when |
zero.action |
A numeric integer, either |
verbose |
Logical value indicating whether to provide function progress commentary. |
... |
Additional arguments controlling pilot density estimation and multi-scale bandwidth-axis
resolution when |
Details
This function implements the bivariate, edge-corrected versions of fixed-bandwidth least squares cross-validation and likelihood cross-validation
as outlined in Sections 3.4.3 and 3.4.4 of Silverman (1986) in order to select an optimal fixed smoothing bandwidth. With type = "adaptive"
it may also be used to select the global bandwidth
for adaptive kernel density estimates, making use of multi-scale estimation (Davies and Baddeley, 2018) via multiscale.density
.
Note that for computational reasons, the leave-one-out procedure is not performed on the pilot density in the adaptive setting; it
is only performed on the final stage estimate. Current development efforts include extending this functionality, see SLIK.adapt
. See also ‘Warning’ below.
Where LSCV.density
is based on minimisation of an unbiased estimate of the mean integrated squared error (MISE) of the density, LIK.density
is based on
maximisation of the cross-validated leave-one-out average of the log-likelihood of the density estimate with respect to h
.
In both functions, the argument zero.action
can be used to control the level of severity in response to small bandwidths that result (due to numerical error) in at least one density value being zero or less.
When zero.action = -1
, the function strictly forbids bandwidths that would result in one or more pixel values of a kernel estimate of the original data (i.e. anything over the whole region) being zero or less—this is the most restrictive truncation. With zero.action = 0
(default), the function
automatically forbids bandwidths that yield erroneous values at the leave-one-out data point locations only. With zero.action = 1
, the minimum machine value (see .Machine$double.xmin
at the prompt) is
used to replace these individual leave-one-out values. When zero.action = 2
, the minimum value of the valid (greater than zero) leave-one-out values is used to replace any erroneous leave-one-out values.
Value
A single numeric value of the estimated bandwidth (if
auto.optim = TRUE
). Otherwise, a [
seqres
x
2]
matrix
giving the bandwidth sequence and corresponding CV
function value.
Warning
Leave-one-out CV for bandwidth selection in kernel
density estimation is notoriously unstable in practice and has a tendency to
produce rather small bandwidths, particularly for spatial data. Satisfactory bandwidths are not guaranteed
for every application; zero.action
can curb adverse numeric effects for very small bandwidths during the optimisation procedures. This method can also be computationally expensive for
large data sets and fine evaluation grid resolutions. The user may also need to
experiment with adjusting hlim
to find a suitable minimum.
Author(s)
T. M. Davies
References
Davies, T.M. and Baddeley A. (2018), Fast computation of spatially adaptive kernel estimates, Statistics and Computing, 28(4), 937-956.
Silverman, B.W. (1986), Density Estimation for Statistics and Data Analysis, Chapman & Hall, New York.
Wand, M.P. and Jones, C.M., 1995. Kernel Smoothing, Chapman & Hall, London.
See Also
SLIK.adapt
and functions for bandwidth selection in package
spatstat
: bw.diggle
;
bw.ppl
; bw.scott
;
bw.frac
.
Examples
data(pbc)
pbccas <- split(pbc)$case
LIK.density(pbccas)
LSCV.density(pbccas)
#* FIXED
# custom limits
LIK.density(pbccas,hlim=c(0.01,4))
LSCV.density(pbccas,hlim=c(0.01,4))
# disable edge correction
LIK.density(pbccas,hlim=c(0.01,4),edge=FALSE)
LSCV.density(pbccas,hlim=c(0.01,4),edge=FALSE)
# obtain objective function
hcv <- LIK.density(pbccas,hlim=c(0.01,4),auto.optim=FALSE)
plot(hcv);abline(v=hcv[which.max(hcv[,2]),1],lty=2,col=2)
#* ADAPTIVE
LIK.density(pbccas,type="adaptive")
LSCV.density(pbccas,type="adaptive")
# change pilot bandwidth used
LIK.density(pbccas,type="adaptive",hp=2)
LSCV.density(pbccas,type="adaptive",hp=2)
Cross-validation bandwidths for spatiotemporal kernel density estimates
Description
Bandwidth selection for standalone spatiotemporal density/intensity based on either unbiased least squares cross-validation (LSCV) or likelihood (LIK) cross-validation, providing an isotropic scalar spatial bandwidth and a scalar temporal bandwidth.
Usage
LIK.spattemp(pp, tt = NULL, tlim = NULL, sedge = c("uniform", "none"),
tedge = sedge, parallelise = NA, start = NULL, verbose = TRUE)
LSCV.spattemp(pp, tt = NULL, tlim = NULL, sedge = c("uniform", "none"),
tedge = sedge, sres = 64, tres = sres, parallelise = NA,
start = NULL, verbose = TRUE)
Arguments
pp |
An object of class |
tt |
A numeric vector of equal length to the number of points in |
tlim |
A numeric vector of length 2 giving the limits of the temporal domain over which to smooth. If supplied, all times in |
sedge |
Character string dictating spatial edge correction. |
tedge |
As |
sres |
Numeric value > 0. Resolution of the [ |
tres |
Numeric value > 0. Resolution of the evaluation points in the temporal margin as defined by the |
parallelise |
Optional numeric argument to invoke parallel processing, by giving the number of CPU cores to use optimisation. This is only useful for larger data sets of many thousand observations. Experimental. Test your system first using |
start |
Optional positive numeric vector of length 2 giving starting values for the internal call to |
verbose |
Logical value indicating whether to print a function progress bar to the console during evaluation. |
Value
A numeric vector of length 2 giving the jointly optimised spatial and temporal bandwidths (named h
and lambda
respectively).
Warning
Leave-one-out CV for bandwidth selection in kernel density estimation is notoriously unstable in practice and has a tendency to produce rather small bandwidths in the fixed bandwidth case. Satisfactory bandwidths are not guaranteed for every application. This method can also be computationally expensive for large data sets and fine evaluation grid resolutions.
Author(s)
T. M. Davies
References
Silverman, B.W. (1986), Density Estimation for Statistics and Data Analysis, Chapman & Hall, New York.
See Also
BOOT.spattemp
, spattemp.density
Examples
data(burk) # Burkitt's Uganda lymphoma data
burkcas <- burk$cases
hlam1 <- LSCV.spattemp(burkcas) #~9 secs
hlam2 <- LSCV.spattemp(burkcas,tlim=c(400,5800))
hlam3 <- LSCV.spattemp(burkcas,start=c(7,400))
rbind(hlam1,hlam2,hlam3)
hlam1 <- LIK.spattemp(burkcas) #~3 secs
hlam2 <- LIK.spattemp(burkcas,tlim=c(400,5800))
hlam3 <- LIK.spattemp(burkcas,start=c(7,400))
rbind(hlam1,hlam2,hlam3)
Jointly optimal bandwidth selection for the spatial relative risk function
Description
Methods to find a jointly optimal, common case-control isotropic bandwidth for use in estimation of the fixed or adaptive kernel-smoothed relative risk function.
Usage
LSCV.risk(
f,
g = NULL,
hlim = NULL,
hseq = NULL,
type = c("fixed", "adaptive"),
method = c("kelsall-diggle", "hazelton", "davies"),
resolution = 64,
edge = TRUE,
hp = NULL,
pilot.symmetry = c("none", "f", "g", "pooled"),
auto.optim = TRUE,
seqres = 30,
parallelise = NA,
verbose = TRUE,
...
)
Arguments
f |
Either a pre-calculated object of class |
g |
As for |
hlim |
An optional vector of length 2 giving the limits of the optimisation routine with respect to the bandwidth. If unspecified, the function attempts to choose this automatically. |
hseq |
An optional increasing sequence of bandwidth values at which to
manually evaluate the optimisation criterion. Used only in the case
|
type |
A character string; |
method |
A character string controlling the selector to use. There are
three types, based on either the mean integrated squared error (MISE)
(Kelsall and Diggle, 1995; default – |
resolution |
Spatial grid size; the optimisation will be based on a
[ |
edge |
Logical value indicating whether to edge-correct the density estimates used. |
hp |
A single numeric value or a vector of length 2 giving the pilot
bandwidth(s) to be used for estimation of the pilot
densities for adaptive risk surfaces. Ignored if |
pilot.symmetry |
A character string used to control the type of
symmetry, if any, to use for the bandwidth factors when computing an
adaptive relative risk surface. See ‘Details’. Ignored if |
auto.optim |
Logical value indicating whether to automate the numerical
optimisation using |
seqres |
Optional resolution of an increasing sequence of bandwidth
values. Only used if |
parallelise |
Numeric argument to invoke parallel processing, giving
the number of CPU cores to use when |
verbose |
Logical value indicating whether to provide function progress commentary. |
... |
Additional arguments such as |
Details
Given the established preference of using a common bandwidth for both case
and control density estimates when constructing a relative
risk surface, This function calculates a ‘jointly optimal’, common isotropic
LSCV bandwidth for the (Gaussian) kernel-smoothed relative risk function
(case-control density-ratio). It can be shown that choosing a bandwidth that
is equal for both case and control density estimates is preferable to
computing ‘separately optimal’ bandwidths (Kelsall and Diggle, 1995). The user
can choose to either calculate a common smoothing parameter for a fixed-bandwidth
relative risk surface (type = "fixed"
; default), or a common global bandwidth for
an adaptive risk surface (type = "adaptive"
). See further comments below.
method = "kelsall-diggle"
: the function computes the common bandwidth which minimises the approximate mean integrated squared error (MISE) of the log-transformed risk surface (Kelsall and Diggle, 1995).method = "hazelton"
: the function minimises a weighted-by-control MISE of the (raw) relative risk function (Hazelton, 2008).method = "davies"
: the optimal bandwidth is one that minimises a crude plug-in approximation to the asymptotic MISE (Davies, 2013). Only possible fortype = "fixed"
.
For jointly optimal, common global bandwidth selection when type = "adaptive"
, the
optimisation routine utilises multiscale.density
. Like LSCV.density
,
the leave-one-out procedure does not affect the pilot density, for which additional
control is offered via the hp
and pilot.symmetry
arguments. The user has the option of
obtaining a so-called symmetric estimate (Davies et al. 2016) via
pilot.symmetry
. This amounts to choosing the same pilot density for
both case and control densities. By choosing "none"
(default), the
result uses the case and control data separately for the fixed-bandwidth
pilots, providing the original asymmetric density-ratio of Davies and
Hazelton (2010). By selecting either of "f"
, "g"
, or
"pooled"
, the pilot density is calculated based on the case, control,
or pooled case/control data respectively (using hp[1]
as the fixed
bandwidth). Davies et al. (2016) noted some beneficial practical behaviour
of the symmetric adaptive surface over the asymmetric. (The pilot bandwidth(s), if not supplied in hp
, are calculated
internally via default use of LSCV.density
, using the requested symmetric-based data set, or separately with respect to the case and control datasets f
and g
if
pilot.symmetry = "none"
.)
Value
A single numeric value of the estimated bandwidth (if
auto.optim = TRUE
). Otherwise, a list of two numeric vectors of equal
length giving the bandwidth sequence (as hs
) and corresponding CV
function value (as CV
).
Warning
The jointly optimal bandwidth selector can be
computationally expensive for large data sets and fine evaluation grid
resolutions. The user may need to experiment with adjusting hlim
to
find a suitable minimum.
Author(s)
T. M. Davies
References
Davies, T. M. (2013), Jointly optimal bandwidth selection for the planar kernel-smoothed density-ratio, Spatial and Spatio-temporal Epidemiology, 5, 51-65.
Davies, T.M. and Baddeley A. (2018), Fast computation of spatially adaptive kernel estimates, Statistics and Computing, 28(4), 937-956.
Davies, T.M. and Hazelton, M.L. (2010), Adaptive kernel estimation of spatial relative risk, Statistics in Medicine, 29(23) 2423-2437.
Davies, T.M., Jones, K. and Hazelton, M.L. (2016), Symmetric adaptive smoothing regimens for estimation of the spatial relative risk function, Computational Statistics & Data Analysis, 101, 12-28.
Hazelton, M. L. (2008), Letter to the editor: Kernel estimation of risk surfaces without the need for edge correction, Statistics in Medicine, 27, 2269-2272.
Kelsall, J.E. and Diggle, P.J. (1995), Kernel estimation of relative risk, Bernoulli, 1, 3-16.
Silverman, B.W. (1986), Density Estimation for Statistics and Data Analysis, Chapman & Hall, New York.
Wand, M.P. and Jones, C.M., 1995. Kernel Smoothing, Chapman & Hall, London.
See Also
Examples
data(pbc)
pbccas <- split(pbc)$case
pbccon <- split(pbc)$control
# FIXED (for common h)
LSCV.risk(pbccas,pbccon)
LSCV.risk(pbccas,pbccon,method="hazelton")
hcv <- LSCV.risk(pbccas,pbccon,method="davies",auto.optim=FALSE)
plot(hcv[,1],log(hcv[,2]));abline(v=hcv[which.min(hcv[,2]),1],col=2,lty=2)
# ADAPTIVE (for common h0)
LSCV.risk(pbccas,pbccon,type="adaptive")
# change pilot bandwidths used
LSCV.risk(pbccas,pbccon,type="adaptive",hp=c(OS(pbccas)/2,OS(pbccon)/2))
# specify pooled-data symmetric relative risk estimator
LSCV.risk(pbccas,pbccon,type="adaptive",hp=OS(pbc),pilot.symmetry="pooled")
# as above, for Hazelton selector
LSCV.risk(pbccas,pbccon,type="adaptive",method="hazelton")
LSCV.risk(pbccas,pbccon,type="adaptive",method="hazelton",hp=c(OS(pbccas)/2,OS(pbccon)/2))
LSCV.risk(pbccas,pbccon,type="adaptive",method="hazelton",hp=OS(pbc),pilot.symmetry="pooled")
Normal scale (NS) bandwidth selector
Description
Provides the asymptotically optimal fixed bandwidths for spatial or spatiotemporal normal densities based on a simple expression.
Usage
NS(
pp,
nstar = c("npoints", "geometric"),
scaler = c("silverman", "IQR", "sd", "var")
)
NS.spattemp(
pp,
tt = NULL,
nstar = "npoints",
scaler = c("silverman", "IQR", "sd", "var")
)
Arguments
pp |
An object of class |
nstar |
Optional. Controls the value to use in place of the number of
observations n in the normal scale formula. Either a character
string, |
scaler |
Optional. Controls the value for a scalar representation of
the spatial (and temporal for |
tt |
A numeric vector of equal length to the number of points in |
Details
These functions calculate scalar smoothing bandwidths for kernel density
estimates of spatial or spatiotemporal data: the optimal values would minimise the
asymptotic mean integrated squared error assuming normally distributed data; see pp. 46-48
of Silverman (1986). The NS
function returns a single bandwidth for isotropic smoothing
of spatial (2D) data. The NS.spattemp
function returns two values – one for
the spatial margin and another for the temporal margin, based on independently applying
the normal scale rule (in 2D and 1D) to the spatial and temporal margins of the supplied data.
- Effective sample size
The formula requires a sample size, and this can be minimally tailored via
nstar
. By default, the function simply uses the number of observations inpp
:nstar = "npoints"
. Alternatively, the user can specify their own value by simply supplying a single positive numeric value tonstar
. ForNS
(not applicable toNS.spattemp
), ifpp
is appp.object
with factor-valuedmarks
, then the user has the option of usingnstar = "geometric"
, which sets the sample size used in the formula to the geometric mean of the counts of observations of each mark. This can be useful for e.g. relative risk calculations, see Davies and Hazelton (2010).- Spatial (and temporal) scale
The
scaler
argument is used to specify spatial (as well as temporal, in use ofNS.spattemp
) scale. For isotropic smoothing in the spatial margin, one may use the ‘robust’ estimate of standard deviation found by a weighted mean of the interquartile ranges of thex
- andy
-coordinates of the data respectively (scaler = "IQR"
). Two other options are the raw mean of the coordinate-wise standard deviations (scaler = "sd"
), or the square root of the mean of the two variances (scaler = "var"
). A fourth option,scaler = "silverman"
(default), sets the scaling constant to be the minimum of the"IQR"
and"sd"
options; see Silverman (1986), p. 47. In use ofNS.spattemp
the univariate version of the elected scale statistic is applied to the recorded times of the data for the temporal bandwidth. Alternatively, likenstar
, the user can specify their own value by simply supplying a single positive numeric value toscaler
forNS
, or a numeric vector of length 2 (in the order of [<spatial scale>, <temporal scale>]) forNS.spattemp
.
Value
A single numeric value of the estimated spatial bandwidth for NS
, or a named numeric vector of length 2 giving
the spatial bandwidth (as h
) and the temporal bandwidth (as lambda
) for NS.spattemp
.
Warning
The NS bandwidth is an approximation, and assumes that the target density is normal. This is considered rare in most real-world applications. Nevertheless, it remains a quick and easy ‘rule-of-thumb’ method with which one may obtain a smoothing parameter. Note that a similar expression for the adaptive kernel estimator is not possible (Davies et al., 2018).
Author(s)
T.M. Davies
References
Davies, T.M. and Hazelton, M.L. (2010), Adaptive kernel estimation of spatial relative risk, Statistics in Medicine, 29(23) 2423-2437.
Davies, T.M., Flynn, C.R. and Hazelton, M.L. (2018), On the utility of asymptotic bandwidth selectors for spatially adaptive kernel density estimation, Statistics & Probability Letters [in press].
Silverman, B.W. (1986), Density Estimation for Statistics and Data Analysis, Chapman & Hall, New York.
Wand, M.P. and Jones, C.M., 1995. Kernel Smoothing, Chapman & Hall, London.
Examples
data(pbc)
NS(pbc)
NS(pbc,nstar="geometric") # uses case-control marks to replace sample size
NS(pbc,scaler="var") # set different scalar measure of spread
data(burk)
NS.spattemp(burk$cases)
NS.spattemp(burk$cases,scaler="sd")
Oversmoothing (OS) bandwidth selector
Description
Provides fixed bandwidths for spatial or spatiotemporal data based on the maximal smoothing (oversmoothing) principle of Terrell (1990).
Usage
OS(
pp,
nstar = c("npoints", "geometric"),
scaler = c("silverman", "IQR", "sd", "var")
)
OS.spattemp(
pp,
tt = NULL,
nstar = "npoints",
scaler = c("silverman", "IQR", "sd", "var")
)
Arguments
pp |
An object of class |
nstar |
Optional. Controls the value to use in place of the number of
observations n in the oversmoothing formula. Either a character
string, |
scaler |
Optional. Controls the value for a scalar representation of
the spatial (and temporal for |
tt |
A numeric vector of equal length to the number of points in |
Details
These functions calculate scalar smoothing bandwidths for kernel density
estimates of spatial or spatiotemporal data: the “maximal amount of smoothing
compatible with the estimated scale of the observed data”. See Terrell
(1990). The OS
function returns a single bandwidth for isotropic smoothing
of spatial (2D) data. The OS.spattemp
function returns two values – one for
the spatial margin and another for the temporal margin, based on independently applying
Terrell's (1990) rule (in 2D and 1D) to the spatial and temporal margins of the supplied data.
- Effective sample size
The formula requires a sample size, and this can be minimally tailored via
nstar
. By default, the function simply uses the number of observations inpp
:nstar = "npoints"
. Alternatively, the user can specify their own value by simply supplying a single positive numeric value tonstar
. ForOS
(not applicable toOS.spattemp
), ifpp
is appp.object
with factor-valuedmarks
, then the user has the option of usingnstar = "geometric"
, which sets the sample size used in the formula to the geometric mean of the counts of observations of each mark. This can be useful for e.g. relative risk calculations, see Davies and Hazelton (2010).- Spatial (and temporal) scale
The
scaler
argument is used to specify spatial (as well as temporal, in use ofOS.spattemp
) scale. For isotropic smoothing in the spatial margin, one may use the ‘robust’ estimate of standard deviation found by a weighted mean of the interquartile ranges of thex
- andy
-coordinates of the data respectively (scaler = "IQR"
). Two other options are the raw mean of the coordinate-wise standard deviations (scaler = "sd"
), or the square root of the mean of the two variances (scaler = "var"
). A fourth option,scaler = "silverman"
(default), sets the scaling constant to be the minimum of the"IQR"
and"sd"
options; see Silverman (1986), p. 47. In use ofOS.spattemp
the univariate version of the elected scale statistic is applied to the recorded times of the data for the temporal bandwidth. Alternatively, likenstar
, the user can specify their own value by simply supplying a single positive numeric value toscaler
forOS
, or a numeric vector of length 2 (in the order of [<spatial scale>, <temporal scale>]) forOS.spattemp
.
Value
A single numeric value of the estimated spatial bandwidth for OS
, or a named numeric vector of length 2 giving
the spatial bandwidth (as h
) and the temporal bandwidth (as lambda
) for OS.spattemp
.
Author(s)
T.M. Davies
References
Davies, T.M. and Hazelton, M.L. (2010), Adaptive kernel estimation of spatial relative risk, Statistics in Medicine, 29(23) 2423-2437.
Terrell, G.R. (1990), The maximal smoothing principle in density estimation, Journal of the American Statistical Association, 85, 470-477.
Examples
data(pbc)
OS(pbc)
OS(pbc,nstar="geometric") # uses case-control marks to replace sample size
OS(pbc,scaler="var") # set different scalar measure of spread
data(burk)
OS.spattemp(burk$cases)
OS.spattemp(burk$cases,scaler="sd")
Simultaneous global/pilot likelihood bandwidth selection
Description
Isotropic global and pilot bandwidth selection for adaptive density/intensity based on likelihood cross-validation.
Usage
SLIK.adapt(
pp,
hold = TRUE,
start = rep(OS(pp), 2),
hlim = NULL,
edge = TRUE,
zero.action = c(-1, 0),
optim.control = list(),
parallelise = NULL,
verbose = TRUE,
...
)
Arguments
pp |
An object of class |
hold |
Logical value indicating whether to hold the global and pilot bandwidths equal throughout the
optimisation; defaults to |
start |
A positively-valued numeric vector of length 2 giving the starting values to be used for the global/pilot
optimisation routine when |
hlim |
An optional vector of length 2 giving the limits of the
optimisation routine with respect to the bandwidth when |
edge |
Logical value indicating whether to edge-correct the density estimates used. |
zero.action |
A numeric vector of length 2, each value being either |
optim.control |
An optional list to be passed to the |
parallelise |
Numeric argument to invoke parallel processing in the brute force leave-one-out calculations, giving
the number of CPU cores to use. Experimental. Test
your system first using |
verbose |
Logical value indicating whether to provide function progress commentary. |
... |
Additional arguments controlling density estimation for the internal calculations. Relevant arguments are |
Details
This function is a generalisation of LIK.density
, and is used in attempts to simultaneously choose
an optimal global and pilot bandwidth for adaptive kernel density estimates. Where LIK.density
for adaptive
estimates assumes the pilot density is held constant (and is not subject to the leave-one-out operations), this function
allows the pilot bandwidth to vary alongside the global.
Thus, in contrast to LIK.density
the internal leave-one-out operations now also affect the
pilot estimation stage. Hence, the set of variable bandwidths changes as each point is left out. In turn, this means the leave-one-out operations must
be computed by brute force, and this is computationally expensive.
Identifiability problems can sometimes arise when the global and pilot bandwidths are allowed to ‘float freely’ in the bivariate optimisation routine, which is the default
behaviour of the function (with hold = FALSE
). This can be curbed by setting hold = TRUE
, which forces both the global and pilot
to be held at the same value during optimisation. Doing this also has the beneficial side effect of turning the problem into one of univariate optimisation, thereby reducing total computational cost. Current work (Davies & Lawson, 2018) provides some empirical evidence that this strategy performs quite well in practice.
Like LSCV.density
and LIK.density
, the argument zero.action
can be used to control the level of severity in response to small bandwidths that result (due to numerical error) in at least one density value being zero or less.
When this argument is passed a vector of length 2, the first entry corresponds to the global bandwidth (and hence refers to checks of the final adaptive density estimate and its leave-one-out values) and the second to the pilot bandwidth (and hence checks the fixed-bandwidth pilot density and its leave-one-out values).
Alternatively a single value may be supplied, which will be taken to be the same for both global and pilot.
See the help page for LIK.density
for an explanation of the four allowable values (-1
, 0
, 1
, 2
) for each component of this argument.
Value
A numeric vector of length 2 giving the likelihood-maximised global and pilot bandwidths.
Note
While theoretically valid, this is a largely experimental function. There is presently little in the literature to suggest how well this type of simultaneous global/pilot bandwidth selection might perform in practice. Current research efforts (Davies & Lawson, 2018) seek in part to address these questions.
Author(s)
T. M. Davies
References
Davies, T.M. and Lawson, A.B. (2018), An evaluation of likelihood-based bandwidth selectors for spatial and spatiotemporal kernel estimates, Submitted for publication.
Silverman, B.W. (1986), Density Estimation for Statistics and Data Analysis, Chapman & Hall, New York.
Wand, M.P. and Jones, C.M., 1995. Kernel Smoothing, Chapman & Hall, London.
See Also
Functions for bandwidth selection in package
spatstat
: bw.diggle
;
bw.ppl
; bw.scott
;
bw.frac
.
Examples
data(pbc)
pbccas <- split(pbc)$case
SLIK.adapt(pbccas)
SLIK.adapt(pbccas,hold=TRUE)
Available global bandwidth range
Description
Gets universally available global bandwidths as represented by several multi-scale density estimate objects
Usage
available.h0(...)
Arguments
... |
Any number of objects of class |
Details
This simple function merely accesses and returns the maximum lower limit and
minimum upper limit of all h0range
components of the
msden
objects passed through ...
. Natural numeric error
arising from any changes to the bandwidth-axis discretisation resolution in
the creation of the msden
objects (i.e. through the
'dimz
' argument) means individual global bandwidth ranges can differ
slightly between affected multi-scale estimates, even if they are all
applied to the same data set. Can additionally be useful when, for example,
creating asymmetric relative risk surfaces based on slices of multi-scale
densities with respect to the case and control data sets, because the
bandwidth factors differ.
Throws an error if one or more of the h0range
components is
incompatible (i.e. all h0range
components must overlap).
Value
A numeric vector of length 2 providing the range of available global bandwidths compatible with all supplied multi-scale density estimates.
Author(s)
T.M. Davies
See Also
multiscale.density
, multiscale.slice
Examples
# See ?multiscale.slice
Bivariate kernel density/intensity estimation
Description
Provides an isotropic adaptive or fixed bandwidth kernel density/intensity estimate of bivariate/planar/2D data.
Usage
bivariate.density(
pp,
h0,
hp = NULL,
adapt = FALSE,
resolution = 128,
gamma.scale = "geometric",
edge = c("uniform", "diggle", "none"),
weights = NULL,
intensity = FALSE,
trim = 5,
xy = NULL,
pilot.density = NULL,
leaveoneout = FALSE,
parallelise = NULL,
davies.baddeley = NULL,
verbose = TRUE
)
Arguments
pp |
An object of class |
h0 |
Global bandwidth for adaptive smoothing or fixed bandwidth for constant smoothing. A numeric value > 0. |
hp |
Pilot bandwidth (scalar, numeric > 0) to be used for fixed
bandwidth estimation of a pilot density in the case of adaptive smoothing.
If |
adapt |
Logical value indicating whether to perform adaptive kernel estimation. See ‘Details’. |
resolution |
Numeric value > 0. Resolution of evaluation grid; the
density/intensity will be returned on a [ |
gamma.scale |
Scalar, numeric value > 0; controls rescaling of the variable bandwidths. Defaults to the geometric mean of the bandwidth factors given the pilot density (as per Silverman, 1986). See ‘Details’. |
edge |
Character string giving the type of edge correction to employ.
|
weights |
Optional numeric vector of nonnegative weights corresponding to
each observation in |
intensity |
Logical value indicating whether to return an intensity estimate (integrates to the sample size over the study region), or a density estimate (default, integrates to 1). |
trim |
Numeric value > 0; controls bandwidth truncation for adaptive estimation. See ‘Details’. |
xy |
Optional alternative specification of the evaluation grid; matches
the argument of the same tag in |
pilot.density |
An optional pixel image (class
|
leaveoneout |
Logical value indicating whether to compute and return the value of the density/intensity at each data point for an adaptive estimate. See ‘Details’. |
parallelise |
Numeric argument to invoke parallel processing, giving
the number of CPU cores to use when |
davies.baddeley |
An optional numeric vector of length 3 to control bandwidth partitioning for approximate adaptive estimation, giving the quantile step values for the variable bandwidths for density/intensity and edge correction surfaces and the resolution of the edge correction surface. May also be provided as a single numeric value. See ‘Details’. |
verbose |
Logical value indicating whether to print a function progress
bar to the console when |
Details
Given a data set x_1,\dots,x_n
in 2D, the isotropic kernel estimate of
its probability density function, \hat{f}(x)
, is given by
\hat{f}(y)=n^{-1}\sum_{i=1}^{n}h(x_i)^{-2}K((y-x_i)/h(x_i))
where h(x)
is the bandwidth function, and K(.)
is the
bivariate standard normal smoothing kernel. Edge-correction factors (not
shown above) are also implemented.
- Fixed
-
The classic fixed bandwidth kernel estimator is used when
adapt = FALSE
. This amounts to settingh(u)=
h0
for allu
. Further details can be found in the documentation fordensity.ppp
. - Adaptive
Setting
adapt = TRUE
requests computation of Abramson's (1982) variable-bandwidth estimator. Under this framework, we haveh(u)=
h0
*min[\tilde{f}(u)^{-1/2}
,G*
trim
]/\gamma
, where\tilde{f}(u)
is a fixed-bandwidth kernel density estimate computed using the pilot bandwidthhp
.Global smoothing of the variable bandwidths is controlled with the global bandwidth
h0
.In the above statement,
G
is the geometric mean of the “bandwidth factors”\tilde{f}(x_i)^{-1/2}
;i=1,\dots,n
. By default, the variable bandwidths are rescaled by\gamma=G
, which is set withgamma.scale = "geometric"
. This allowsh0
to be considered on the same scale as the smoothing parameter in a fixed-bandwidth estimate i.e. on the scale of the recorded data. You can use any other rescaling ofh0
by settinggamma.scale
to be any scalar positive numeric value; though note this only affects\gamma
– see the next bullet. When using a scale-invarianth0
, setgamma.scale = 1
.The variable bandwidths must be trimmed to prevent excessive values (Hall and Marron, 1988). This is achieved through
trim
, as can be seen in the equation forh(u)
above. The trimming of the variable bandwidths is universally enforced by the geometric mean of the bandwidth factorsG
independent of the choice of\gamma
. By default, the function truncates bandwidth factors at five times their geometric mean. For stricter trimming, reducetrim
, for no trimming, settrim = Inf
.For even moderately sized data sets and evaluation grid
resolution
, adaptive kernel estimation can be rather computationally expensive. The argumentdavies.baddeley
is used to approximate an adaptive kernel estimate by a sum of fixed bandwidth estimates operating on appropriate subsets ofpp
. These subsets are defined by “bandwidth bins”, which themselves are delineated by a quantile step value0<\delta<1
. E.g. setting\delta=0.05
will create 20 bandwidth bins based on the 0.05th quantiles of the Abramson variable bandwidths. Adaptive edge-correction also utilises the partitioning, with pixel-wise bandwidth bins defined using the value0<\beta<1
, and the option to decrease the resolution of the edge-correction surface for computation to a [L
\times
L
] grid, where0 <L \le
resolution
. Ifdavies.baddeley
is supplied as a vector of length 3, then the values[1], [2], and [3]
correspond to the parameters\delta
,\beta
, andL_M=L_N
in Davies and Baddeley (2018). If the argument is simply a single numeric value, it is used for both\delta
and\beta
, withL_M=L_N=
resolution
(i.e. no edge-correction surface coarsening).Computation of leave-one-out values (when
leaveoneout = TRUE
) is done by brute force, and is therefore very computationally expensive for adaptive smoothing. This is because the leave-one-out mechanism is applied to both the pilot estimation and the final estimation stages. Experimental code to do this via parallel processing using theforeach
routine is implemented. Fixed-bandwidth leave-one-out can be performed directly indensity.ppp
.
Value
If leaveoneout = FALSE
, an object of class "bivden"
.
This is effectively a list with the following components:
z |
The
resulting density/intensity estimate, a pixel image object of class
|
h0 |
A copy of the value of |
hp |
A copy of the value of |
h |
A numeric
vector of length equal to the number of data points, giving the bandwidth
used for the corresponding observation in |
him |
A pixel
image (class |
q |
Edge-correction
weights; a pixel |
gamma |
The value of |
geometric |
The geometric mean |
pp |
A copy of the |
Else, if leaveoneout = TRUE
, simply a numeric vector of length equal to the
number of data points, giving the leave-one-out value of the function at the
corresponding coordinate.
Author(s)
T.M. Davies and J.C. Marshall
References
Abramson, I. (1982). On bandwidth variation in kernel estimates — a square root law, Annals of Statistics, 10(4), 1217-1223.
Davies, T.M. and Baddeley A. (2018), Fast computation of spatially adaptive kernel estimates, Statistics and Computing, 28(4), 937-956.
Davies, T.M. and Hazelton, M.L. (2010), Adaptive kernel estimation of spatial relative risk, Statistics in Medicine, 29(23) 2423-2437.
Davies, T.M., Jones, K. and Hazelton, M.L. (2016), Symmetric adaptive smoothing regimens for estimation of the spatial relative risk function, Computational Statistics & Data Analysis, 101, 12-28.
Diggle, P.J. (1985), A kernel method for smoothing point process data, Journal of the Royal Statistical Society, Series C, 34(2), 138-147.
Hall P. and Marron J.S. (1988) Variable window width kernel density estimates of probability densities. Probability Theory and Related Fields, 80, 37-49.
Marshall, J.C. and Hazelton, M.L. (2010) Boundary kernels for adaptive density estimators on regions with irregular boundaries, Journal of Multivariate Analysis, 101, 949-963.
Silverman, B.W. (1986), Density Estimation for Statistics and Data Analysis, Chapman & Hall, New York.
Wand, M.P. and Jones, C.M., 1995. Kernel Smoothing, Chapman & Hall, London.
Examples
data(chorley) # Chorley-Ribble data from package 'spatstat'
# Fixed bandwidth kernel density; uniform edge correction
chden1 <- bivariate.density(chorley,h0=1.5)
# Fixed bandwidth kernel density; diggle edge correction; coarser resolution
chden2 <- bivariate.density(chorley,h0=1.5,edge="diggle",resolution=64)
# Adaptive smoothing; uniform edge correction
chden3 <- bivariate.density(chorley,h0=1.5,hp=1,adapt=TRUE)
# Adaptive smoothing; uniform edge correction; partitioning approximation
chden4 <- bivariate.density(chorley,h0=1.5,hp=1,adapt=TRUE,davies.baddeley=0.025)
oldpar <- par(mfrow=c(2,2))
plot(chden1);plot(chden2);plot(chden3);plot(chden4)
par(oldpar)
Burkitt's lymphoma in Uganda
Description
Data of the spatiotemporal locations of Burkitt's lymphoma in the Western Nile district of Uganda from 1960 to 1975.
Format
burk
is a named list with three members:
$cases
-
An object of class
ppp
giving the spatial locations (eastings/northings) of the 188 cases of Burkitt's lymphoma recorded in individuals of various ages (mostly children); the spatial study region as a polygonalowin
; as well as the time (in days since 1/1/1960) of each observation stored as themarks
of the points. $cases.age
-
A numeric vector of length 188 giving the age of each individual in
$cases
. $controls
-
An object of class
ppp
giving 500 artificially simulated spatial-only observations to pose as a ‘control’ data set representing the at-risk population. The data were generated from a smooth kernel estimate of the spatial margin of the cases. The similarity between the case point distribution and the true at-risk population dispersion can be seen in e.g. Figure 2 of Middleton and Greenland (1954).
Source
The case data were extracted from the burkitt
object of the splancs
R package;
see
Rowlingson B. and Diggle P.J. (2017), splancs: Spatial and Space-Time Point Pattern Analysis, R
package version 2.01-40; https://CRAN.R-project.org/package=splancs.
References
Bailey, T.C. and Gatrell, A.C. (1995), Interactive spatial data analysis, Longman; Harlow.
Middleton, J.F.M. and Greenland, D.J. (1954), Land and population in West Nile District, Uganda, The Geographical Journal, 120, 446–455.
Examples
data(burk)
summary(burk$cases)
oldpar <- par(mfrow=c(1,3))
plot(burk$cases)
plot(burk$controls)
plot(density(marks(burk$cases)),xlim=range(marks(burk$cases)))
par(oldpar)
2D fast-Fourier wrapper around 'fftwtools' or 'stats' package
Description
Utilises the Fastest Fourier Transform in the West (FFTW) via the 'fftwtools' package if available, else reverts to built-in functionality
Usage
fft2d(x, inverse = FALSE, fftw = fftw_available())
Arguments
x |
A numeric matrix to be transformed. |
inverse |
Whether it should compute the inverse transform (defaults to |
fftw |
Whether the |
Details
This function is called wherever sparr
seeks to perform a 2D fast-Fourier
transform. Where available, computational expense is noticeably reduced by appealing to routines
in the independent ‘FFTW’ toolbox. The user is encouraged to install the corresponding R package fftwtools
from CRAN;
this function will automatically detect and use the faster option, otherwise will
defer to the built-in fft
.
Value
The fast-Fourier (inverse) transform. A complex-valued matrix of the same size as x
.
Author(s)
J.C. Marshall
Examples
# System check
sparr:::fftw_available()
system.time(fft(matrix(1:2000^2,2000)))
system.time(fft2d(matrix(1:2000^2,2000)))
Veterinary foot-and-mouth disease outbreak data
Description
Data of the spatial locations and time of farms infected by veterinary foot-and-mouth disease in the county of Cumbria, UK, over a course of nearly 250 days between February and August in 2001. There are 410 infected farms (the cases), and 1866 uninfected farms (the controls). The data have been jittered and randomly thinned by an unspecified amount to preserve anonymity.
Format
fmd
is a named list with two members:
$cases
-
An object of class
ppp
giving the spatial locations of the 410 infected farms within a polygonal study region representing the county of Cumbria. Themarks
component of this object contain the integer day of infection (from beginning of study period). $controls
-
An object of class
ppp
defined over the same spatial study region with the locations of the 1866 uninfected farms.
Acknowledgements
The Animal and Plant Health Agency (APHA), UK, provided permission to use this dataset.
References
Fernando, W.T.P.S. and Hazelton, M.L. (2014), Generalizing the spatial relative risk function, Spatial and Spatio-temporal Epidemiology, 8, 1-10.
Keeling M, Woolhouse M, Shaw D, Matthews L, Chase-Topping M, Haydon D, et al. (2001), Dynamics of the 2001 UK foot and mouth epidemic: stochastic dispersal in a heterogeneous landscape, Science, 294, 813-817.
Lawson A, Zhou H. (2005), Spatial statistical modeling of disease outbreaks with particular reference to the UK foot and mouth disease (FMD) epidemic of 2001, Preventative Veterinary Medicine, 71, 141-156.
Examples
data(fmd)
summary(fmd$cases)
summary(fmd$controls)
oldpar <- par(mfrow=c(1,2))
plot(fmd$cases)
plot(fmd$controls)
par(oldpar)
Multi-scale adaptive kernel density/intensity estimation
Description
Computes adaptive kernel estimates of spatial density/intensity using a 3D FFT for multiple global bandwidth scales.
Usage
multiscale.density(
pp,
h0,
hp = NULL,
h0fac = c(0.25, 1.5),
edge = c("uniform", "none"),
resolution = 128,
dimz = 64,
gamma.scale = "geometric",
trim = 5,
intensity = FALSE,
pilot.density = NULL,
xy = NULL,
taper = TRUE,
verbose = TRUE
)
Arguments
pp |
An object of class |
h0 |
Reference global bandwidth for adaptive smoothing; numeric value >
0. Multiscale estimates will be computed by rescaling this value as per
|
hp |
Pilot bandwidth (scalar, numeric > 0) to be used for fixed
bandwidth estimation of the pilot density. If |
h0fac |
A numeric vector of length 2 stipulating the span of the global
bandwidths in the multiscale estimates. Interpreted as a multiplicative
factor on |
edge |
Character string dictating edge correction. |
resolution |
Numeric value > 0. Resolution of evaluation grid in the
spatial domain; the densities/intensities will be returned on a
[ |
dimz |
Resolution of z- (rescaled bandwidth)-axis in the trivariate convolution. Higher values increase precision of the multiscale estimates at a computational cost. See ‘Details’. |
gamma.scale |
Scalar, numeric value > 0; controls rescaling of the
variable bandwidths. Defaults to the geometric mean of the bandwidth factors
given the pilot density (as per Silverman, 1986). See the documentation for
|
trim |
Numeric value > 0; controls bandwidth truncation for adaptive
estimation. See the documentation for |
intensity |
Logical value indicating whether to return an intensity estimate (integrates to the sample size over the study region), or a density estimate (default, integrates to 1). |
pilot.density |
An optional pixel image (class
|
xy |
Optional alternative specification of the spatial evaluation grid;
matches the argument of the same tag in |
taper |
Logical value indicating whether to taper off the trivariate
kernel outside the range of |
verbose |
Logical value indicating whether to print function progress. |
Details
Davies & Baddeley (2018) investigated computational aspects of Abramson's (1982) adaptive kernel smoother for spatial (2D) data. This function is the implementation of the 3D convolution via a fast-Fourier transform (FFT) which allows simultaneous calculation of an adaptive kernel estimate at multiple global bandwidth scales.
These ‘multiple global bandwidth scales’ are computed with respect to
rescaling a reference value of the global bandwidth passed to the h0
argument. This rescaling is defined by the range provided to the argument
h0fac
. For example, by default, the function will compute the
adaptive kernel estimate for a range of global bandwidths between
0.25*h0
and 1.5*h0
. The exact numeric limits are subject to
discretisation, and so the returned valid range of global bandwidths will
differ slightly. The exact resulting range following function execution is
returned as the h0range
element of the result, see ‘Value’ below.
The distinct values of global bandwidth used (which define the
aforementioned h0range
) and hence the total number of pixel
images
returned depend on both the width of the span
h0fac
and the discretisation applied to the bandwidth axis through
dimz
. Increasing this z-resolution will provide more pixel images and
hence greater numeric precision, but increases computational cost. The
returned pixel images
that represent the multiscale
estimates are stored in a named list (see ‘Value’), whose names reflect the
corresponding distinct global bandwidth. See ‘Examples’ for the easy way to
extract these distinct global bandwidths.
The user can request an interpolated density/intensity estimate for any
global bandwidth value within h0range
by using the
multiscale.slice
function, which returns an object of class
bivden
.
Value
An object of class "msden"
. This is very similar to a
bivden
object, with lists of pixel
im
ages in the z
, him
, and q
components (instead of standalone images).
z |
A list of the resulting
density/intensity estimates; each member being a pixel image object of class
|
h0 |
A copy of the reference value of |
h0range |
A vector of length 2 giving the actual range of global bandwidth values available (inclusive). |
hp |
A copy of the value of |
h |
A numeric vector of length equal to the
number of data points, giving the bandwidth used for the corresponding
observation in |
him |
A list of pixel images (class |
q |
Edge-correction weights; list of pixel |
gamma |
The numeric value of |
geometric |
The geometric mean of the
untrimmed variable bandwidth factors. This will be identical to |
pp |
A copy of the |
Author(s)
T.M. Davies and A. Baddeley
References
Abramson, I. (1982). On bandwidth variation in kernel estimates — a square root law, Annals of Statistics, 10(4), 1217-1223.
Davies, T.M. and Baddeley A. (2018), Fast computation of spatially adaptive kernel estimates, Statistics and Computing, 28(4), 937-956.
Silverman, B.W. (1986), Density Estimation for Statistics and Data Analysis, Chapman & Hall, New York.
See Also
bivariate.density
, multiscale.slice
Examples
data(chorley) # Chorley-Ribble data (package 'spatstat')
ch.multi <- multiscale.density(chorley,h0=1)
plot(ch.multi)
ch.pilot <- bivariate.density(chorley,h0=0.75) # with pre-defined pilot density
ch.multi2 <- multiscale.density(chorley,h0=1,pilot.density=ch.pilot$z)
plot(ch.multi2)
data(pbc)
# widen h0 scale, increase z-axis resolution
pbc.multi <- multiscale.density(pbc,h0=2,hp=1,h0fac=c(0.25,2.5),dimz=128)
plot(pbc.multi)
Slicing a multi-scale density/intensity object
Description
Takes slices of a multi-scale density/intensity estimate at desired global bandwidths
Usage
multiscale.slice(msob, h0, checkargs = TRUE)
Arguments
msob |
An object of class |
h0 |
Desired global bandwidth(s); the density/intensity estimate
corresponding to which will be returned. A numeric vector. All values must be in the
available range provided by |
checkargs |
Logical value indicating whether to check validity of
|
Details
Davies & Baddeley (2018) demonstrate that once a multi-scale
density/intensity estimate has been computed, we may take slices parallel to
the spatial domain of the trivariate convolution to return the estimate at
any desired global bandwidth. This function is the implementation thereof
based on a multi-scale estimate resulting from a call to
multiscale.density
.
The function returns an error if the
requested slices at h0
are not all within the available range of
pre-computed global bandwidth scalings as defined by the h0range
component of msob
.
Because the contents of the msob
argument, an object of class
msden
, are returned based on a discretised set of global
bandwidth scalings, the function internally computes the desired surface as
a pixel-by-pixel linear interpolation using the two discretised global
bandwidth rescalings that bound each requested h0
. (Thus, numeric
accuracy of the slices is improved with an increase to the dimz
argument of the preceding call to multiscale.density
at the cost of
additional computing time.)
Value
If h0
is scalar, an object of class bivden
with components
corresponding to the requested slice at h0
. If h0
is a vector, a list of objects
of class bivden
.
Author(s)
T.M. Davies
References
Davies, T.M. and Baddeley A. (2018), Fast computation of spatially adaptive kernel estimates, Statistics and Computing, 28(4), 937-956.
See Also
multiscale.density
, bivariate.density
Examples
data(chorley) # Chorley-Ribble data (package 'spatstat')
ch.multi <- multiscale.density(chorley,h0=1,h0fac=c(0.5,2))
available.h0(ch.multi)
ch.slices <- multiscale.slice(ch.multi,h0=c(0.7,1.1,1.6))
par(mfcol=c(2,3)) # plot each density and edge-correction surface
for(i in 1:3) { plot(ch.slices[[i]]$z); plot(ch.slices[[i]]$q) }
Primary biliary cirrhosis data
Description
Data of the locations of 761 cases of primary biliary cirrhosis in several
adjacent health regions of north-eastern England, along with 3020 controls
representing the at-risk population, collected between 1987 and 1994. These
data were first presented and analysed by Prince et al. (2001); subsequent
analysis of these data in the spirit of sparr
was performed in
Davies and Hazelton (2010). Also included is the polygonal study region.
Format
pbc
is a dichotomously marked
ppp.object
, with locations expressed in UK Ordnance
Survey Coordinates (km).
Acknowledgements
The authors thank Prof. Peter Diggle for providing access to these data.
Source
Prince et al. (2001), The geographical distribution of primary biliary cirrhosis in a well-defined cohort, Hepatology, 34, 1083-1088.
References
Davies, T.M. and Hazelton, M.L. (2010), Adaptive kernel estimation of spatial relative risk, Statistics in Medicine, 29(23) 2423-2437.
Examples
data(pbc)
summary(pbc)
plot(pbc)
Plotting sparr objects
Description
plot
methods for classes bivden
, stden
,
rrs
, rrst
and msden
.
Usage
## S3 method for class 'bivden'
plot(
x,
what = c("z", "edge", "bw"),
add.pts = FALSE,
auto.axes = TRUE,
override.par = TRUE,
...
)
## S3 method for class 'msden'
plot(x, what = c("z", "edge", "bw"), sleep = 0.2, override.par = TRUE, ...)
## S3 method for class 'rrs'
plot(
x,
auto.axes = TRUE,
tol.show = TRUE,
tol.type = c("upper", "lower", "two.sided"),
tol.args = list(levels = 0.05, lty = 1, drawlabels = TRUE),
...
)
## S3 method for class 'rrst'
plot(
x,
tselect = NULL,
type = c("joint", "conditional"),
fix.range = FALSE,
tol.show = TRUE,
tol.type = c("upper", "lower", "two.sided"),
tol.args = list(levels = 0.05, lty = 1, drawlabels = TRUE),
sleep = 0.2,
override.par = TRUE,
expscale = FALSE,
...
)
## S3 method for class 'stden'
plot(
x,
tselect = NULL,
type = c("joint", "conditional"),
fix.range = FALSE,
sleep = 0.2,
override.par = TRUE,
...
)
Arguments
x |
|
what |
A character string to select plotting of result ( |
add.pts |
Logical value indicating whether to add the observations to
the image plot using default |
auto.axes |
Logical value indicating whether to display the plot with automatically added x-y axes and an ‘L’ box in default styles. |
override.par |
Logical value indicating whether to override the
existing graphics device parameters prior to plotting, resetting
|
... |
Additional graphical parameters to be passed to
|
sleep |
Single positive numeric value giving the amount of time (in
seconds) to |
tol.show |
Logical value indicating whether to show pre-computed tolerance contours on the plot(s). The object |
tol.type |
A character string used to control the type of tolerance contour displayed; a test for elevated risk ( |
tol.args |
A named list of valid arguments to be passed directly to |
tselect |
Either a single numeric value giving the time at which to return the plot, or a vector of length 2 giving an interval of times over which to plot. This argument must respect the stored temporal bound in |
type |
A character string to select plotting of joint/unconditional spatiotemporal estimate (default) or conditional spatial density given time. |
fix.range |
Logical value indicating whether use the same color scale limits for each plot in the sequence. Ignored if the user supplies a pre-defined |
expscale |
Logical value indicating whether to force a raw-risk scale. Useful for users wishing to plot a log-relative risk surface, but to have the raw-risk displayed on the colour ribbon. |
Details
In all instances, visualisation is deferred to
plot.im
, for which there are a variety of
customisations available the user can access via ...
. The one
exception is when plotting observation-specific "diggle"
edge
correction factors—in this instance, a plot of the spatial observations is
returned with size proportional to the influence of each correction weight.
When plotting a rrs
object, a pre-computed p-value
surface (see argument tolerate
in risk
) will
automatically be superimposed at a significance level of 0.05. Greater
flexibility in visualisation is gained by using tolerance
in
conjunction with contour
.
An msden
, stden
, or rrst
object is plotted as an animation, one pixel image
after another, separated by sleep
seconds. If instead you intend the
individual images to be plotted in an array of images, you should first set
up your plot device layout, and ensure override.par = FALSE
so that
the function does not reset these device parameters itself. In such an
instance, one might also want to set sleep = 0
.
Value
Plots to the relevant graphics device.
Author(s)
T.M. Davies
Examples
data(pbc)
data(fmd)
data(burk)
# 'bivden' object
pbcden <- bivariate.density(split(pbc)$case,h0=3,hp=2,adapt=TRUE,davies.baddeley=0.05,verbose=FALSE)
plot(pbcden)
plot(pbcden,what="bw",main="PBC cases\n variable bandwidth surface",xlab="Easting",ylab="Northing")
# 'stden' object
burkden <- spattemp.density(burk$cases,tres=128) # observation times are stored in marks(burk$cases)
plot(burkden,fix.range=TRUE,sleep=0.1) # animation
plot(burkden,tselect=c(1000,3000),type="conditional") # spatial densities conditional on each time
# 'rrs' object
pbcrr <- risk(pbc,h0=4,hp=3,adapt=TRUE,tolerate=TRUE,davies.baddeley=0.025,edge="diggle")
plot(pbcrr) # default
plot(pbcrr,tol.args=list(levels=c(0.05,0.01),lty=2:1,col="seagreen4"),auto.axes=FALSE)
# 'rrst' object
f <- spattemp.density(fmd$cases,h=6,lambda=8)
g <- bivariate.density(fmd$controls,h0=6)
fmdrr <- spattemp.risk(f,g,tolerate=TRUE)
plot(fmdrr,sleep=0.1,fix.range=TRUE)
plot(fmdrr,type="conditional",sleep=0.1,tol.type="two.sided",
tol.args=list(levels=0.05,drawlabels=FALSE))
# 'msden' object
pbcmult <- multiscale.density(split(pbc)$case,h0=4,h0fac=c(0.25,2.5))
plot(pbcmult) # densities
plot(pbcmult,what="edge") # edge correction surfaces
plot(pbcmult,what="bw") # bandwidth surfaces
Printing sparr objects
Description
print
methods for classes bivden
, stden
,
rrs
, rrst
and msden
.
Usage
## S3 method for class 'bivden'
print(x, ...)
## S3 method for class 'msden'
print(x, ...)
## S3 method for class 'rrs'
print(x, ...)
## S3 method for class 'rrst'
print(x, ...)
## S3 method for class 'stden'
print(x, ...)
Arguments
x |
|
... |
Ignored. |
Author(s)
T.M. Davies
Random point generation inside polygon
Description
Generates a random point pattern of n
iid points with any specified distribution based on a pixel image and a corresponding polygonal window.
Usage
rimpoly(n, z, w = NULL, correction = 1.1, maxpass = 50)
Arguments
n |
Number of points to generate. |
z |
A pixel image of class |
w |
A polygonal window of class |
correction |
An adjustment to the number of points generated at the initial pass of the internal loop in an effort to minimise the total number of passes required to reach |
maxpass |
The maximum number of passes allowed before the function exits. If this is reached before |
Details
This function is a deliberate variant of rpoint
(Baddeley et. al, 2015), to be accessed when the user desires
a randomly generated point pattern based on a pixel image, but wants the window of the point pattern to be a corresponding irregular polygon, as opposed to a binary
image mask (which, when converted to a polygon directly, gives jagged edges based on the union of the pixels). When the user specifies their own polygonal window, a while
loop is called and repeated as many
times as necessary (up to maxpass
times) to find n
points inside w
(when w = NULL
, then the aforementioned union of the pixels of z
is used, obtained via as.polygonal(Window(z))
). The loop is necessary because the standard behaviour of rpoint
can (and often does)
yield points that sit in corners of pixels which lie outside the corresponding w
.
The correction
argument is used to determine how many points are generated initially,
which will be ceiling(correction*n)
; to minimise the number of required passes over the loop this is by default set to give a number slightly higher than the requested n
.
An error is thrown if Window(z)
and w
do not overlap.
Value
An object of class ppp
containing the n
generated points, defined with the polygonal owin
, w
.
Author(s)
T.M. Davies
References
Baddeley, A., Rubak, E. and Turner, R. (2015) Spatial Point Patterns: Methodology and Applications with R, Chapman and Hall/CRC Press, UK.
Examples
data(pbc)
Y <- bivariate.density(pbc,h0=2.5,res=25)
# Direct use of 'rpoint':
A <- rpoint(500,Y$z)
npoints(A)
# Using 'rimpoly' without supplying polygon:
B <- rimpoly(500,Y$z)
npoints(B)
# Using 'rimpoly' with the original pbc polygonal window:
C <- rimpoly(500,Y$z,Window(Y$pp))
npoints(C)
oldpar <- par(mfrow=c(1,3))
plot(A,main="rpoint")
plot(B,main="rimpoly (no polygon supplied)")
plot(C,main="rimpoly (original polygon supplied)")
par(oldpar)
Spatial relative risk/density ratio
Description
Estimates a relative risk function based on the ratio of two 2D kernel density estimates.
Usage
risk(
f,
g = NULL,
log = TRUE,
h0 = NULL,
hp = h0,
adapt = FALSE,
shrink = FALSE,
shrink.args = list(rescale = TRUE, type = c("lasso", "Bithell"), lambda = NA),
tolerate = FALSE,
doplot = FALSE,
pilot.symmetry = c("none", "f", "g", "pooled"),
epsilon = 0,
verbose = TRUE,
...
)
Arguments
f |
Either a pre-calculated object of class |
g |
As for |
log |
Logical value indicating whether to return the (natural)
log-transformed relative risk function as recommended by Kelsall and Diggle
(1995a). Defaults to |
h0 |
A single positive numeric value or a vector of length 2 giving the
global bandwidth(s) to be used for case/control density estimates;
defaulting to a common oversmoothing bandwidth computed via |
hp |
A single numeric value or a vector of length 2 giving the pilot
bandwidth(s) to be used for fixed-bandwidth estimation of the pilot
densities for adaptive risk surfaces. Ignored if |
adapt |
A logical value indicating whether to employ adaptive smoothing
for internally estimating the densities. Ignored if |
shrink |
A logical value indicating whether to compute the shrinkage estimator
of Hazelton (2023). This is only possible for |
shrink.args |
A named list of optional arguments controlling the shrinkage estimator.
Possible entries are |
tolerate |
A logical value indicating whether to internally calculate a corresponding asymptotic p-value surface (for tolerance contours) for the estimated relative risk function. See ‘Details’. |
doplot |
Logical. If |
pilot.symmetry |
A character string used to control the type of
symmetry, if any, to use for the bandwidth factors when computing an
adaptive relative risk surface. See ‘Details’. Ignored if |
epsilon |
A single non-negative numeric value used for optional scaling to produce additive constant to each density in the raw ratio (see ‘Details’). A zero value requests no additive constant (default). |
verbose |
Logical value indicating whether to print function progress during execution. |
... |
Additional arguments passed to any internal calls of
|
Details
The relative risk function is defined here as the ratio of the ‘case’
density to the ‘control’ (Bithell, 1990; 1991). Using kernel density
estimation to model these densities (Diggle, 1985), we obtain a workable
estimate thereof. This function defines the risk function r in the
following fashion:
r = (fd + epsilon*max(gd))/(gd +
epsilon*max(gd))
,
where fd
and gd
denote the case and
control density estimates respectively. Note the (optional) additive
constants defined by epsilon
times the maximum of each of the
densities in the numerator and denominator respectively (see Bowman and
Azzalini, 1997). A more recent shrinkage estimator developed by Hazelton (2023)
is also implemented.
The log-risk function rho, given by rho = log[r], is argued to be preferable in practice as it imparts a sense of symmetry in the way the case and control densities are treated (Kelsall and Diggle, 1995a;b). The option of log-transforming the returned risk function is therefore selected by default.
When computing adaptive relative risk functions, the user has the option of
obtaining a so-called symmetric estimate (Davies et al. 2016) via
pilot.symmetry
. This amounts to choosing the same pilot density for
both case and control densities. By choosing "none"
(default), the
result uses the case and control data separately for the fixed-bandwidth
pilots, providing the original asymmetric density-ratio of Davies and
Hazelton (2010). By selecting either of "f"
, "g"
, or
"pooled"
, the pilot density is calculated based on the case, control,
or pooled case/control data respectively (using hp[1]
as the fixed
bandwidth). Davies et al. (2016) noted some beneficial practical behaviour
of the symmetric adaptive surface over the asymmetric.
If the user selects tolerate = TRUE
, the function internally computes
asymptotic tolerance contours as per Hazelton and Davies (2009) and Davies
and Hazelton (2010). When adapt = FALSE
, the reference density
estimate (argument ref.density
in tolerance
) is taken
to be the estimated control density. The returned pixel
im
age of p-values (see ‘Value’) is
interpreted as an upper-tailed test i.e. smaller p-values represent
greater evidence in favour of significantly increased risk. For greater
control over calculation of tolerance contours, use tolerance
.
Value
An object of class "rrs"
. This is a named list with the
following components:
rr |
A pixel |
f |
An object of class |
g |
An object of
class |
P |
Only included if |
Author(s)
T.M. Davies
References
Bithell, J.F. (1990), An application of density estimation to geographical epidemiology, Statistics in Medicine, 9, 691-701.
Bithell, J.F. (1991), Estimation of relative risk functions, Statistics in Medicine, 10, 1745-1751.
Bowman, A.W. and Azzalini A. (1997), Applied Smoothing Techniques for Data Analysis: The Kernel Approach with S-Plus Illustrations, Oxford University Press Inc., New York.
Davies, T.M. and Hazelton, M.L. (2010), Adaptive kernel estimation of spatial relative risk, Statistics in Medicine, 29(23) 2423-2437.
Davies, T.M., Jones, K. and Hazelton, M.L. (2016), Symmetric adaptive smoothing regimens for estimation of the spatial relative risk function, Computational Statistics & Data Analysis, 101, 12-28.
Diggle, P.J. (1985), A kernel method for smoothing point process data, Journal of the Royal Statistical Society Series C, 34(2), 138-147.
Hazelton, M.L. (2023), Shrinkage estimators of the spatial relative risk function, Submitted for publication.
Hazelton, M.L. and Davies, T.M. (2009), Inference based on kernel estimates of the relative risk function in geographical epidemiology, Biometrical Journal, 51(1), 98-109.
Kelsall, J.E. and Diggle, P.J. (1995a), Kernel estimation of relative risk, Bernoulli, 1, 3-16.
Kelsall, J.E. and Diggle, P.J. (1995b), Non-parametric estimation of spatial variation in relative risk, Statistics in Medicine, 14, 2335-2342.
Examples
data(pbc)
pbccas <- split(pbc)$case
pbccon <- split(pbc)$control
h0 <- OS(pbc,nstar="geometric")
# Fixed (with tolerance contours)
pbcrr1 <- risk(pbccas,pbccon,h0=h0,tolerate=TRUE)
# Fixed shrinkage
pbcrr2 <- risk(pbccas,pbccon,h0=h0,shrink=TRUE,shrink.args=list(lambda=4))
# Asymmetric adaptive
pbcrr3 <- risk(pbccas,pbccon,h0=h0,adapt=TRUE,hp=c(OS(pbccas)/2,OS(pbccon)/2),
tolerate=TRUE,davies.baddeley=0.05)
# Symmetric (pooled) adaptive
pbcrr4 <- risk(pbccas,pbccon,h0=h0,adapt=TRUE,tolerate=TRUE,hp=OS(pbc)/2,
pilot.symmetry="pooled",davies.baddeley=0.05)
# Symmetric (case) adaptive; from two existing 'bivden' objects
f <- bivariate.density(pbccas,h0=h0,hp=2,adapt=TRUE,pilot.density=pbccas,
edge="diggle",davies.baddeley=0.05,verbose=FALSE)
g <- bivariate.density(pbccon,h0=h0,hp=2,adapt=TRUE,pilot.density=pbccas,
edge="diggle",davies.baddeley=0.05,verbose=FALSE)
pbcrr5 <- risk(f,g,tolerate=TRUE,verbose=FALSE)
oldpar <- par(mfrow=c(2,2))
plot(pbcrr1,override.par=FALSE,main="Fixed")
plot(pbcrr2,override.par=FALSE,main="Fixed shrinkage")
plot(pbcrr3,override.par=FALSE,main="Asymmetric adaptive")
plot(pbcrr4,override.par=FALSE,main="Symmetric (pooled) adaptive")
par(oldpar)
Internal sparr
functions
Description
Internal sparr
functions, only meant to be accessed by the user directly in a life-threatening situation.
Usage
adens(x,bwim,bwpts,resolution,edge,diggle,weights,intensity,hstep,qstep,qres,verbose)
BAM.single(h,edge,BP)
BAM.prep(cases,controls,lambda,erode,res)
bivden.LOO(pp,h0,hp,gamma.scale,trim,resolution,parallel)
checkdb(db)
checkedge(edge)
checkit(h,str)
checkran(ran,nm)
cv.RelRisk(X1,X2=NULL,h=NA,lambda.min=0.1,lambda.max=20,LHmin=TRUE,n.lambda=32)
cv.RelRisk.Bithell(X1,X2=NULL,h=NA,lambda.min=0.1,lambda.max=20,LHmin=TRUE,n.lambda=32)
edgeh(bwim,pres,tres,step,W,verbose=FALSE)
gethcats(h,breaks=NULL,step=0.05)
identical_windows(w1,w2)
KO(xy)
LSCV.density.spatial.single(h,pp,res,edge)
LSCV.risk.single(h,cases,controls,res,edge,hazey)
posifybivden(x,eps=.Machine$double.xmin)
processgamma(gamma.scale,pds)
processnstar(n,pp)
processscaler(s,pp)
point_image_by_bw(h,hc,points,weights,WM)
taperoff(x,zeropoint=0,onepoint=1,type=c("smooth", "cosine"))
tol.asy.ada(f,g,beta,verbose=FALSE)
tol.asy.fix(f,g,pooled,verbose=FALSE)
tol.mc.ada(rs,ITER,parallelise,verbose,...)
tol.mc.fix(rs,ITER,parallelise,verbose,...)
Author(s)
T.M. Davies
Spatiotemporal kernel density estimation
Description
Provides a fixed-bandwidth kernel estimate of continuous spatiotemporal data.
Usage
spattemp.density(pp, h = NULL, tt = NULL, lambda = NULL,
tlim = NULL, sedge = c("uniform", "none"), tedge = sedge,
sres = 128, tres = NULL, verbose = TRUE)
Arguments
pp |
An object of class |
h |
Fixed bandwidth to smooth the spatial margin. A numeric value > 0. If unsupplied, the oversmoothing bandwidth is used as per |
tt |
A numeric vector of equal length to the number of points in |
lambda |
Fixed bandwidth to smooth the temporal margin; a numeric value > 0. If unsupplied, the function internally computes the Sheather-Jones bandwith using |
tlim |
A numeric vector of length 2 giving the limits of the temporal domain over which to smooth. If supplied, all times in |
sedge |
Character string dictating spatial edge correction. |
tedge |
As |
sres |
Numeric value > 0. Resolution of the [ |
tres |
Numeric value > 0. Resolution of the evaluation points in the temporal margin as defined by the |
verbose |
Logical value indicating whether to print a function progress bar to the console during evaluation. |
Details
This function produces a fixed-bandwidth kernel estimate of a single spatiotemporal density, with isotropic smoothing in the spatial margin, as per Fernando & Hazelton (2014). Estimates may be edge-corrected for an irregular spatial study window and for the bounds on the temporal margin as per tlim
; this edge-correction is performed in precisely the same way as the "uniform"
option in bivariate.density
.
Specifically, for n
trivariate points in space-time (pp
, tt
, tlim
), we have
\hat{f}(x,t)=n^{-1}\sum_{i=1}^{n}h^{-2}\lambda^{-1}K((x-x_i)/h)L((t-t_i)/\lambda)/(q(x)q(t)),
where x\in W\subset R^2
and t\in T\subset R
; K
and L
are the 2D and 1D Gaussian kernels controlled by fixed bandwidths h
(h
) and \lambda
(lambda
) respectively; and q(x)=\int_W h^{-2}K((u-x)/h)du
and q(t)=\int_T \lambda^{-1}L((w-t)/\lambda)dw
are optional edge-correction factors (sedge
and tedge
).
The above equation provides the joint or unconditional density at a given space-time location (x,t)
. In addition to this, the function also yields the conditional density at each grid time, defined as
\hat{f}(x|t)=\hat{f}(x,t)/\hat{f}(t),
where \hat{f}(t)=n^{-1}\sum_{i=1}^{n}\lambda^{-1}L((t-t_i)/\lambda)/q(t)
is the univariate kernel estimate of the temporal margin. Normalisation of the two versions \hat{f}(x,t)
and \hat{f}(x|t)
is the only way they differ. Where in the unconditional setting we have \int_W\int_T\hat{f}(x,t)dt dx=1
, in the conditional setting we have \int_W\hat{f}(x|t) dx=1
for all t
. See Fernando & Hazelton (2014) for further details and practical reasons as to why we might prefer one over the other in certain situations.
The objects returned by this function (see ‘Value’ below) are necessary for kernel estimation of spatiotemporal relative risk surfaces, which is performed by spattemp.risk
.
Value
An object of class "stden"
. This is effectively a list with the following components:
z |
A named (by time-point) list of pixel |
z.cond |
A named (by time-point) list of pixel |
h |
The scalar bandwidth used for spatial smoothing. |
lambda |
The scalar bandwidth used for temporal smoothing. |
tlim |
A numeric vector of length two giving the temporal bound of the density estimate. |
spatial.z |
A pixel |
temporal.z |
An object of class |
qs |
A pixel |
qt |
A numeric vector giving the edge-correction weights for the temporal margin. |
pp |
A |
tgrid |
A numeric vector giving the discretised time grid at which the spatiotemporal density was evaluated (matches the names of |
Author(s)
T.M. Davies
References
Duong, T. (2007), ks: Kernel Density Estimation and Kernel Discriminant Analysis for Multivariate Data in R, Journal of Statistical Software, 21(7), 1-16.
Fernando, W.T.P.S. and Hazelton, M.L. (2014), Generalizing the spatial relative risk function, Spatial and Spatio-temporal Epidemiology, 8, 1-10.
Kelsall, J.E. and Diggle, P.J. (1995), Kernel estimation of relative risk, Bernoulli, 1, 3-16.
Sheather, S. J. and Jones, M. C. (1991), A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society Series B, 53, 683-690.
Silverman, B.W. (1986), Density Estimation for Statistics and Data Analysis, Chapman & Hall, New York.
See Also
bivariate.density
, spattemp.risk
, spattemp.slice
Examples
data(burk)
burkcas <- burk$cases
burkden1 <- spattemp.density(burkcas,tres=128)
summary(burkden1)
hlam <- LIK.spattemp(burkcas,tlim=c(400,5900),verbose=FALSE)
burkden2 <- spattemp.density(burkcas,h=hlam[1],lambda=hlam[2],tlim=c(400,5900),tres=256)
tims <- c(1000,2000,3500)
par(mfcol=c(2,3))
for(i in tims){
plot(burkden2,i,override.par=FALSE,fix.range=TRUE,main=paste("joint",i))
plot(burkden2,i,"conditional",override.par=FALSE,main=paste("cond.",i))
}
Spatiotemporal relative risk/density ratio
Description
Produces a spatiotemporal relative risk surface based on the ratio of two kernel estimates of spatiotemporal densities.
Usage
spattemp.risk(f, g, log = TRUE, tolerate = FALSE, finiteness = TRUE, verbose = TRUE)
Arguments
f |
An object of class |
g |
Either an object of class |
log |
Logical value indicating whether to return the log relative risk (default) or the raw ratio. |
tolerate |
Logical value indicating whether to compute and return asymptotic |
finiteness |
Logical value indicating whether to internally correct infinite risk (on the log-scale) to the nearest finite value to avoid numerical problems. A small extra computational cost is required. |
verbose |
Logical value indicating whether to print function progress during execution. |
Details
Fernando & Hazelton (2014) generalise the spatial relative risk function (e.g. Kelsall & Diggle, 1995) to the spatiotemporal domain. This is the implementation of their work, yielding the generalised log-relative risk function for x\in W\subset R^2
and t\in T\subset R
. It produces
\hat{\rho}(x,t)=\log(\hat{f}(x,t))-\log(\hat{g}(x,t)),
where \hat{f}(x,t)
is a fixed-bandwidth kernel estimate of the spatiotemporal density of the cases (argument f
) and \hat{g}(x,t)
is the same for the controls (argument g
).
When argument
g
is an object of classstden
arising from a call tospattemp.density
, the resolution, spatial domain, and temporal domain of this spatiotemporal estimate must match that off
exactly, else an error will be thrown.When argument
g
is an object of classbivden
arising from a call tobivariate.density
, it is assumed the ‘at-risk’ control density is static over time. In this instance, the above equation for the relative risk becomes\hat{\rho}=\log(\hat{f}(x,t))+\log|T|-\log(g(x))
. The spatial density estimate ing
must match the spatial domain off
exactly, else an error will be thrown.The estimate
\hat{\rho}(x,t)
represents the joint or unconditional spatiotemporal relative risk overW\times T
. This means that the raw relative risk\hat{r}(x,t)=\exp{\hat{\rho}(x,t)}
integrates to 1 with respect to the control density over space and time:\int_W \int_T r(x,t)g(x,t) dt dx = 1
. This function also computes the conditional spatiotemporal relative risk at each time point, namely\hat{\rho}(x|t)=\log{\hat{f}(x|t)}-\log{\hat{g}(x|t)},
where
\hat{f}(x|t)
and\hat{g}(x|t)
are the conditional densities over space of the cases and controls given a specific time pointt
(see the documentation forspattemp.density
). In terms of normalisation, we therefore have\int_W r(x|t)g(x|t) dx = 1
. In the case where\hat{g}
is static over time, one may simply replace\hat{g}(x|t)
with\hat{g}(x)
in the above.Based on the asymptotic properties of the estimator, Fernando & Hazelton (2014) also define the calculation of tolerance contours for detecting statistically significant fluctuations in such spatiotemporal log-relative risk surfaces. This function can produce the required
p
-value surfaces by settingtolerate = TRUE
; and if so, results are returned for both the unconditional (x,t) and conditional (x|t) surfaces. See the examples in the documentation forplot.rrst
for details on how one may superimpose contours at specificp
-values for given evaluation timest
on a plot of relative risk on the spatial margin.
Value
An object of class “rrst
”. This is effectively a list with the following members:
rr |
A named (by time-point) list of pixel |
rr.cond |
A named list of pixel |
P |
A named list of pixel |
P.cond |
As above, for the conditional relative risk surfaces. |
f |
A copy of the object |
g |
As above, for |
tlim |
A numeric vector of length two giving the temporal bound of the density estimate. |
Author(s)
T.M. Davies
References
Fernando, W.T.P.S. and Hazelton, M.L. (2014), Generalizing the spatial relative risk function, Spatial and Spatio-temporal Epidemiology, 8, 1-10.
See Also
spattemp.density
, spattemp.slice
, bivariate.density
Examples
data(fmd)
fmdcas <- fmd$cases
fmdcon <- fmd$controls
f <- spattemp.density(fmdcas,h=6,lambda=8) # stden object as time-varying case density
g <- bivariate.density(fmdcon,h0=6) # bivden object as time-static control density
rho <- spattemp.risk(f,g,tolerate=TRUE)
print(rho)
oldpar <- par(mfrow=c(2,3))
plot(rho$f$spatial.z,main="Spatial margin (cases)") # spatial margin of cases
plot(rho$f$temporal.z,main="Temporal margin (cases)") # temporal margin of cases
plot(rho$g$z,main="Spatial margin (controls)") # spatial margin of controls
plot(rho,tselect=50,type="conditional",tol.args=list(levels=c(0.05,0.0001),
lty=2:1,lwd=1:2),override.par=FALSE)
plot(rho,tselect=100,type="conditional",tol.args=list(levels=c(0.05,0.0001),
lty=2:1,lwd=1:2),override.par=FALSE)
plot(rho,tselect=200,type="conditional",tol.args=list(levels=c(0.05,0.0001),
lty=2:1,lwd=1:2),override.par=FALSE)
par(oldpar)
Slicing a spatiotemporal object
Description
Takes slices of the spatiotemporal kernel density or relative risk function estimate at desired times
Usage
spattemp.slice(stob, tt, checkargs = TRUE)
Arguments
stob |
An object of class |
tt |
Desired time(s); the density/risk surface estimate
corresponding to which will be returned. This value must be in the
available range provided by |
checkargs |
Logical value indicating whether to check validity of
|
Details
Contents of the stob
argument are returned based on a discretised set of times.
This function internally computes the desired surfaces as
pixel-by-pixel linear interpolations using the two discretised times
that bound each requested tt
.
The function returns an error if any of the
requested slices at tt
are not within the available range of
times as given by the tlim
component of stob
.
Value
A list of lists of pixel im
ages, each of which corresponds to
the requested times in tt
, and are named as such.
If stob
is an object of class stden
:
z |
Pixel images of the joint spatiotemporal density corresponding to |
z.cond |
Pixel images of the conditional spatiotemporal density given each time in |
If stob
is an object of class rrst
:
rr |
Pixel images of the joint spatiotemporal relative risk corresponding to |
rr.cond |
Pixel images of the conditional spatiotemporal relative risk given each time in |
P |
Only present if |
P.cond |
Only present if |
Author(s)
T.M. Davies
References
Fernando, W.T.P.S. and Hazelton, M.L. (2014), Generalizing the spatial relative risk function, Spatial and Spatio-temporal Epidemiology, 8, 1-10.
See Also
spattemp.density
, spattemp.risk
, bivariate.density
Examples
data(fmd)
fmdcas <- fmd$cases
fmdcon <- fmd$controls
f <- spattemp.density(fmdcas,h=6,lambda=8)
g <- bivariate.density(fmdcon,h0=6)
rho <- spattemp.risk(f,g,tolerate=TRUE)
f$tlim # requested slices must be in this range
# slicing 'stden' object
f.slice1 <- spattemp.slice(f,tt=50) # evaluation timestamp
f.slice2 <- spattemp.slice(f,tt=150.5) # interpolated timestamp
oldpar <- par(mfrow=c(2,2))
plot(f.slice1$z$'50')
plot(f.slice1$z.cond$'50')
plot(f.slice2$z$'150.5')
plot(f.slice2$z.cond$'150.5')
par(oldpar)
# slicing 'rrst' object
rho.slices <- spattemp.slice(rho,tt=c(50,150.5))
oldpar <- par(mfrow=c(2,2))
plot(rho.slices$rr$'50');tol.contour(rho.slices$P$'50',levels=0.05,add=TRUE)
plot(rho.slices$rr$'150.5');tol.contour(rho.slices$P$'150.5',levels=0.05,add=TRUE)
plot(rho.slices$rr.cond$'50');tol.contour(rho.slices$P.cond$'50',levels=0.05,add=TRUE)
plot(rho.slices$rr.cond$'150.5');tol.contour(rho.slices$P.cond$'150.5',levels=0.05,add=TRUE)
par(oldpar)
Summarising sparr objects
Description
summary
methods for classes bivden
, stden
,
rrs
, rrst
and msden
.
Usage
## S3 method for class 'bivden'
summary(object, ...)
## S3 method for class 'msden'
summary(object, ...)
## S3 method for class 'rrs'
summary(object, ...)
## S3 method for class 'rrst'
summary(object, ...)
## S3 method for class 'stden'
summary(object, ...)
Arguments
object |
|
... |
Ignored. |
Author(s)
T.M. Davies
Classification by p-value surfaces
Description
Classifies observed case/control points according to an estimated p-value surface.
Usage
tol.classify(rs, cutoff = 0.05, pim = NULL, ...)
Arguments
rs |
An object of class |
cutoff |
A numeric value between 0 and 1, defining the cutoff p-value used to classify points; defaults to 0.05. |
pim |
A pixel |
... |
Arguments to be passed to |
Details
This function takes in a relative risk surface computed with
risk
and corresponding p-value surface (the latter used for
drawing tolerance contours), and attempts to classify both the case and
control points as either falling within or without contours drawn at a level
of cutoff
. Points that fall 'inside' the contours are deemed to be
associated with p-values less than or equal to cutoff
and hence are
usually interpreted as being in spatial areas of significant risk. This is
useful for identifying characteristics of points that fall inside
'pockets of significance' as delineated by tolerance contours.
Upon execution, the function first inspects the rs
object to
determine whether it possesses a P
component (i.e. an internally
computed p-value surface provided when risk
is called with
optional argument tolerate=TRUE
). If it exists, this is used. If not,
the function then looks to see if the pim
argument has been supplied.
If it has, it must be a pixel im
age compatible
with the risk surface in rs$rr
. If neither rs$P
nor pim
is present, the function internally calls tolerance
with
arguments supplied to ...
to produce the desired surface.
The return object is a list that splits each of the case and control ppp
data objects (these are stored as rs$f$pp
and rs$g$pp
) in the
originally supplied risk surface object) into two constituent ppp
objects – one comprising the points inside the cutoff
contours (fin
and gin
), the
other for those points outside the cutoff
contours (fout
and gout
).
In addition, the index values of the original data objects rs$f$pp
and
rs$g$pp
that correspond to the points in fin
and gin
are
provided as numeric vectors (findex
and gindex
). These objects
are useful if you need to cross-reference data-specific characteristics from
some other (corresponding) data set.
Further supplied in the returned list are quantities describing the overall classification
structure (pcmask
), as well as contour-specific identification and classification
(finsplit
, ginsplit
, pcpolys
). The pcpolys
object can be plotted
to illustrate the unique contour IDs with tol.classplot
.
Value
A list of ten components:
fin |
Point pattern of 'case' observations classified as being inside
the |
fout |
Point pattern of 'case' observations falling outside the
|
gin |
As |
gout |
As |
findex |
Numeric vector giving the raw index values of the original
pattern of cases which provide |
gindex |
As |
finsplit |
A list of the indexes in |
ginsplit |
As |
pcmask |
The classification object of class |
pcpolys |
A list of the same length as |
Author(s)
T. M. Davies
References
Davies, T.M. and Hazelton, M.L. (2010), Adaptive kernel estimation of spatial relative risk, Statistics in Medicine, 29(23) 2423-2437.
Hazelton, M.L. and Davies, T.M. (2009), Inference based on kernel estimates of the relative risk function in geographical epidemiology, Biometrical Journal, 51(1), 98-109.
Kelsall, J.E. and Diggle, P.J. (1995), Kernel estimation of relative risk, Bernoulli, 1, 3-16.
Examples
data(pbc)
pbccas <- split(pbc)$case
pbccon <- split(pbc)$control
h0 <- OS(pbc,nstar="geometric")
pbcrr <- risk(pbccas,pbccon,h0=h0,tolerate=TRUE)
pbcclass <- tol.classify(pbcrr)
## Not run:
plot(pbcrr)
points(pbcclass$fin,col="red",pch=3,cex=0.5)
points(pbcclass$fout,col="seagreen4",cex=0.5)
chrr <- risk(chorley,h0=0.7,tolerate=TRUE)
chclass <- tol.classify(chrr,cutoff=0.4)
plot(chrr,tol.args=list(levels=0.4))
for(i in 1:length(chclass$finsplit)){
points(chrr$f$pp[chclass$finsplit[[i]]],col=i,pch=19)
}
## End(Not run)
Plot tolerance contour classification scheme
Description
Permits illustration of the uniquely identified tolerance contour regions
arising from a call to tol.classify
.
Usage
tol.classplot(pcpolys, add = FALSE, annotate = TRUE, ...)
Arguments
pcpolys |
A list of polygonal windows, each of class |
add |
A logical value indicating whether to add the unique regions to an existing plot (see 'Examples'). |
annotate |
A logical value indicating whether to annotate each unique
region with its identifying number (which will correspond to the uniquely
split/classified points in a corresponding call to |
... |
Additional arguments to be passed to |
Details
The tol.classify
function permits identification of
individual significance regions (that is, the tolerance contours). In
turn, tol.classplot
may be used to visualise these regions
optionally annotated by their unique identification number to better
understand the region-specific classifications of the case and control points.
Value
Plots to the relevant graphics device.
Author(s)
T. M. Davies
References
Davies, T.M. and Hazelton, M.L. (2010), Adaptive kernel estimation of spatial relative risk, Statistics in Medicine, 29(23) 2423-2437.
Hazelton, M.L. and Davies, T.M. (2009), Inference based on kernel estimates of the relative risk function in geographical epidemiology, Biometrical Journal, 51(1), 98-109.
Kelsall, J.E. and Diggle, P.J. (1995), Kernel estimation of relative risk, Bernoulli, 1, 3-16.
Examples
## Not run:
chrr <- risk(chorley,h0=0.7,tolerate=TRUE)
chclass <- tol.classify(chrr,cutoff=0.4)
oldpar <- par(mfrow=c(1,3))
#
plot(chrr,tol.args=list(levels=0.4))
tol.classplot(chclass$pcpolys)
plot(Window(chorley))
axis(1)
axis(2)
box(bty="l")
tol.classplot(chclass$pcpolys,add=TRUE,col=2,font=2,cex=1.5)
#
par(oldpar)
## End(Not run)
Plot tolerance contours
Description
Draw contours based on a p-value matrix.
Usage
tol.contour(pim, test = c("upper", "lower", "two-sided"), ...)
Arguments
pim |
A pixel |
test |
An optional character string giving the type of manipulation to
be applied to the p-values, corresponding to a test for significantly
elevated risk ( |
... |
Additional arguments to be passed to |
Details
Note that no checks on the numeric content of pim
are made. The
function assumes the pixel im
age of p-values
in pim
is supplied with respect to an upper-tailed test for elevated
risk (this is exactly the way the p-value surface is returned when
tolerance
is used). This is important if one makes subsequent
use of test
, which manipulates the p-values to draw at desired
significance levels
.
Value
Opens a new graphics device and displays a contour
plot if add = FALSE
, otherwise adds the contours to the plot in the
existing active graphics device.
Author(s)
T. M. Davies
Examples
# See ?tolerance
Tolerance by p-value surfaces
Description
Calculates a p-value surface based on asymptotic theory or Monte-Carlo (MC) permutations describing the extremity of risk given a fixed or adaptive kernel-smoothed density-ratio, allowing the drawing of tolerance contours.
Usage
tolerance(
rs,
method = c("ASY", "MC"),
ref.density = NULL,
beta = 0.025,
ITER = 100,
parallelise = NULL,
verbose = TRUE,
...
)
Arguments
rs |
An object of class |
method |
A character string specifying the method of calculation.
|
ref.density |
Required if |
beta |
A numeric value |
ITER |
Number of iterations for the Monte-Carlo permutations. Ignored
if |
parallelise |
Numeric argument to invoke parallel processing, giving
the number of CPU cores to use when |
verbose |
Logical value indicating whether to print function progress during execution. |
... |
Additional arguments to be passed to |
Details
This function implements developments in Hazelton and Davies (2009) (fixed) and Davies and Hazelton (2010) (adaptive) to compute pointwise p-value surfaces based on asymptotic theory of kernel-smoothed relative risk surfaces. Alternatively, the user may elect to calculate the p-value surfaces using Monte-Carlo methods (see Kelsall and Diggle, 1995). Superimposition upon a plot of the risk surface contours of these p-values at given significance levels (i.e. “tolerance contours”) can be an informative way of exploring the statistical significance of the extremity of risk across the defined study region.
Implementation of the Monte-Carlo method simply involves random allocation of case/control marks and
re-estimation of the risk surface ITER
times, against which the
original estimate is compared. While not dependent on asymptotic theory, it is
computationally expensive, and it has been suggested that it might have some
undesirable practical consequences in certain settings (Hazelton and Davies,
2009). When performing the MC simulations, the same global (and pilot, if
necessary) bandwidths and edge-correction regimens are employed as were used
in the initial density estimates of the observed data. With regard to
arguments to be passed to internal calls of risk
, the user
should take care to use ...
to set the epsilon
value to match
that which was used in creation of the object passed to rs
(if this
was set to a non-default value). Furthermore, if performing MC simulations
for the adaptive relative risk function, the function borrows the value of
the beta
argument to speed things up via partitioning, and the user
should additionally access ...
to set the same pilot.symmetry
value as was used for creation of the object passed to rs
, in the
same way as for any non-default use of epsilon
. This will ensure the
simulations are all performed under the same conditions as were used to estimate the original risk
function.
Value
A pixel im
age of the estimated
p-value surface.
Note
The returned p-values are geared so that “smallness”
corresponds to statistical significance of elevated risk, that is, an
upper-tailed test. The complement of the p-values will yeild
significance of reduced risk; a lower-tailed test. When using
tol.contour
, the user can control what type of contours to
display.
Author(s)
T. M. Davies
References
Davies, T.M. and Baddeley A. (2018), Fast computation of spatially adaptive kernel estimates, Statistics and Computing, 28(4), 937-956.
Davies, T.M. and Hazelton, M.L. (2010), Adaptive kernel estimation of spatial relative risk, Statistics in Medicine, 29(23) 2423-2437.
Davies, T.M., Jones, K. and Hazelton, M.L. (2016), Symmetric adaptive smoothing regimens for estimation of the spatial relative risk function, Computational Statistics & Data Analysis, 101, 12-28.
Hazelton, M.L. and Davies, T.M. (2009), Inference based on kernel estimates of the relative risk function in geographical epidemiology, Biometrical Journal, 51(1), 98-109.
Kelsall, J.E. and Diggle, P.J. (1995), Kernel estimation of relative risk, Bernoulli, 1, 3-16.
Examples
data(pbc)
h0 <- LSCV.risk(pbc,method="hazelton");h0
pbccas <- split(pbc)[[1]]
pbccon <- split(pbc)[[2]]
# ASY
riskfix <- risk(pbc,h0=h0)
fixtol1 <- tolerance(riskfix,ref.density=density(pbc,OS(pbc)))
riskada <- risk(pbc,h0=h0,adapt=TRUE,hp=NS(pbc),pilot.symmetry="pooled",davies.baddeley=0.025)
adatol1 <- tolerance(riskada)
oldpar <- par(mfrow=c(1,2))
plot(riskfix)
tol.contour(fixtol1,levels=c(0.1,0.05,0.01),lty=3:1,add=TRUE)
plot(riskada)
tol.contour(adatol1,levels=c(0.1,0.05,0.01),lty=3:1,add=TRUE)
par(oldpar)
# MC
fixtol2 <- tolerance(riskfix,method="MC",ITER=200)
adatol2 <- tolerance(riskada,method="MC",ITER=200,parallelise=2) # ~90secs with parallelisation
oldpar <- par(mfrow=c(1,2))
plot(riskfix)
tol.contour(fixtol2,levels=c(0.1,0.05,0.01),lty=3:1,add=TRUE)
plot(riskada)
tol.contour(adatol2,levels=c(0.1,0.05,0.01),lty=3:1,add=TRUE)
par(oldpar)