Pseudo-absences

Definition

When using data.type = 'binary' in BIOMOD_FormatingData, biomod2 requires either presence / absence data, or presence-only data supplemented with pseudo-absences. These pseudo-absences can be generated with the same function.

The general idea behind is to select points in the studied area that will be used to compare observed environment (represented by the presences) against what is available. Those points are NOT to be considered as absences, and rather represent the available environment. From a semantic point of view, several terms can be encountered in the literature for similar purposes : pseudo-absences ; and background data when it comes to MaxEnt mostly, or quadrature points when applying point-process model (PPM). These last two differ from pseudo-absences in the fact that they allow presence points to be selected as well, while pseudo-absences can not be selected over coordinates matching an observation.

Note that it is NOT allowed to mix both absences and pseudo-absences data.



How to select them ? - Methods

3 different methods are implemented within biomod2 to select pseudo-absences (PA) through either bm_PseudoAbsences or BIOMOD_FormatingData :

  1. random : PA are randomly selected over the studied area (excluding presence points)
  2. disk : PA are randomly selected within circles around presence points defined by a minimum and a maximum distance values (same projection system units as the presence points)
  3. SRE : a Surface Range Envelop model is used to randomly select PA outside this envelop, i.e. in conditions (combination of explanatory variables) that differ in a defined proportion from those of presence points

The selection of one or the other method will depend on a more important and underlying question :
how the data set presence points were obtained ?

PA workflow

The 3 methods proposed within biomod2 do not depend on the same assumptions :

random disk SRE
Geographical assumption no yes no
Environmental assumption no no yes
Realized niche fully sampled no yes yes

The random method is the one with the least assumptions, and should be the default choice when no sufficient information is available about the species ecology and/or the sampling design. The disk and SRE methods assume that the realized niche of the species has been fully sampled, either geographically or environmentally speaking.

Note that it is also possible for the user to select by himself his own pseudo-absence points.


See code examples :

MAIN functions, BINARY data, section Prepare data & parameters / Pseudo-absences extraction

AUXILIARY functions, BINARY data, section Auxiliary functions : biomod2 data / Generate pseudo-absence datasets



How to select them ? - Barbet-Massin et al. 2012

Barbet-Massin M, Jiguet F, Albert CH, Thuiller W (2012). Selecting pseudo-absences for species distribution models: how, where and how many?. Methods in Ecology and Evolution, 3: 327-338. 10.1111/j.2041-210X.2011.00172.x

This paper tried to estimate the relative effect of method and number of PA on predictive accuracy of common modelling techniques, using :

Results were varying between modelling techniques :

advice from biomod2’s team: