Type: | Package |
Title: | Long Term Water Quality Trend Analysis |
Description: | Enable users to evaluate long-term trends using a Generalized Additive Modeling (GAM) approach. The model development includes selecting a GAM structure to describe nonlinear seasonally-varying changes over time, incorporation of hydrologic variability via either a river flow or salinity, the use of an intervention to deal with method or laboratory changes suspected to impact data values, and representation of left- and interval-censored data. The approach has been applied to water quality data in the Chesapeake Bay, a major estuary on the east coast of the United States to provide insights to a range of management- and research-focused questions. Methodology described in Murphy (2019) <doi:10.1016/j.envsoft.2019.03.027>. |
Author: | Rebecca Murphy, Elgin Perry, Jennifer Keisman, Jon Harcum, Erik W Leppo |
Version: | 2.0.12 |
Maintainer: | Erik W Leppo <Erik.Leppo@tetratech.com> |
Depends: | R (≥ 3.5.0) |
License: | GPL-3 |
LazyData: | TRUE |
RoxygenNote: | 7.3.2 |
Encoding: | UTF-8 |
Imports: | dataRetrieval, digest, dplyr, fitdistrplus, grDevices, lubridate, knitr, memoise, mgcv, pander, plyr, readxl, sessioninfo, survival |
Suggests: | devtools, imputeTS, markdown, nlme, rmarkdown, testthat |
URL: | https://github.com/tetratech/baytrends, https://tetratech.github.io/baytrends/ |
BugReports: | https://github.com/tetratech/baytrends/issues |
VignetteBuilder: | knitr |
Collate: | analysisOrganizeData.R appendDateFeatures.R baytrends.R checkRange.R chkParameter.R closeOut.R createResiduals.R data.R detrended.flow.R detrended.salinity.R expectMaxFunctions.r findFile.R flwAveragePred.R fmtPval.R gamDiff.R gamPlotCalc.r gamPlotDisp.R gamPlotDispSeason.R gamTables.R gamTest.r gamTestSeason.r getUSGSflow.R headers2.R imputeCensored.R initializeResults.r layerAggregation.R loadData.R loadExcel.R loadModels.R loadModelsResid.R makeSurvDF.R mergeFlow.R mergeSalinity.R reAttDF.R saveDF.R seasAdjflow2.R selectData.R smwrBase_baseDay.R smwrBase_baseDay2decimal.R smwrBase_dectime.R smwrBase_dectime2Date.R smwrBase_eventProcessing.R smwrBase_fillMissing.R smwrBase_na2miss.R supportFunctions.R unSurv.R gdata_nobs.R zzz.R |
NeedsCompilation: | no |
Packaged: | 2024-07-26 14:07:11 UTC; Erik.Leppo |
Repository: | CRAN |
Date/Publication: | 2024-07-26 14:40:02 UTC |
baytrends: Long Term Water Quality Trend Analysis
Description
The baytrends package was developed to enable users to evaluate long-term trends in the Chesapeake Bay using a Generalized Additive Modeling (GAM) approach. The model development includes selecting a GAM structure to describe nonlinear seasonally-varying changes over time, incorporation of hydrologic variability via either a river flow or salinity, the use of an intervention to deal with method or laboratory changes suspected to impact data values, and representation of left- and interval-censored data. This approach, which is fully transferable to other systems, allows for Chesapeake Bay water quality data to be evaluated in a statistically rigorous, yet flexible way to provide insights to a range of management- and research-focused questions. Methodology described in Murphy, RR, E Perry, J Harcum, and J Keisman 2019 A Generalized Additive Model approach to evaluating water quality: Chesapeake Bay case study. Environmental Modelling & Software, 118 (2019) 1-13. <https://doi.org/10.1016/j.envsoft.2019.03.027>.
Details
This software program is preliminary or provisional and is subject to revision. This software program is for testing only, no warranty, expressed or implied, is made as to the accuracy and functioning of the program and related program material nor shall the fact of distribution constitute any such warranty, and no responsibility is assumed in connection therewith. This software is provided 'AS IS.
See Also
Useful links:
Report bugs at https://github.com/tetratech/baytrends/issues
Expectation maximization function: Log-normal case, i censured
Description
Expectation maximization function: Log-normal case, i censured
Usage
.ExpLNiCens(l, u, mu, sigma, iCens = NA)
Arguments
l |
l |
u |
u |
mu |
predicted values from mgcv::gam |
sigma |
model standard deviation |
iCens |
default=NA |
Expectation maximization function: Log-normal case, left censured
Description
Expectation maximization function: Log-normal case, left censured
Usage
.ExpLNlCens(l, u, mu, sigma, lCens = NA)
Arguments
l |
l |
u |
u |
mu |
predicted values from mgcv::gam |
sigma |
model standard deviation |
lCens |
default=NA |
Expectation maximization function: Log-normal case, Cens
Description
Expectation maximization function: Log-normal case, Cens
Usage
.ExpLNmCens(df, dep, mu, sigma)
Arguments
df |
data frame |
dep |
dependent variable |
mu |
predicted values from mgcv::gam |
sigma |
model standard deviation |
Expectation maximization function: Log-normal case, right censured
Description
Expectation maximization function: Log-normal case, right censured
Usage
.ExpLNrCens(l, u, mu, sigma, rCens = NA)
Arguments
l |
l |
u |
u |
mu |
predicted values from mgcv::gam |
sigma |
model standard deviation |
rCens |
default=NA |
Expectation maximization function: Normal case, i censured
Description
Expectation maximization function: Normal case, i censured
Usage
.ExpNiCens(l, u, mu, sigma, iCens = NA)
Arguments
l |
l |
u |
u |
mu |
predicted values from mgcv::gam |
sigma |
model standard deviation |
iCens |
default=NA |
Expectation maximization function: Normal case, left censured
Description
Expectation maximization function: Normal case, left censured
Usage
.ExpNlCens(l, u, mu, sigma, lCens = NA)
Arguments
l |
l |
u |
u |
mu |
predicted values from mgcv::gam |
sigma |
model standard deviation |
lCens |
default=NA |
Expectation maximization function: Normal case
Description
Expectation maximization function: Normal case
Usage
.ExpNmCens(df, dep, mu, sigma)
Arguments
df |
data frame |
dep |
dependent variable |
mu |
predicted values from mgcv::gam |
sigma |
model standard deviation |
Expectation maximization function: Normal case, right censured
Description
Expectation maximization function: Normal case, right censured
Usage
.ExpNrCens(l, u, mu, sigma, rCens = NA)
Arguments
l |
l |
u |
u |
mu |
predicted values from mgcv::gam |
sigma |
model standard deviation |
rCens |
default=NA |
Print out figure title (customization of pandoc.emphasis and pandoc.strong )
Description
Print out figure title (customization of pandoc.emphasis and pandoc.strong )
Usage
.F(text, n = NULL, t = "e")
Arguments
text |
text of figure title |
n |
figure number |
t |
emphasis or stong |
Value
n/a
See Also
.F .H .H2 .H3 .H4 .P .T .V
Examples
text<-"Hello World!"
.F(text)
.F(text, 4)
.F(text, 4,'e')
.F(text, 4,'s')
Print out header (shortened pandoc.header)
Description
Print out header (shortened pandoc.header)
Usage
.H(text, n = 1)
Arguments
text |
text of header |
n |
header level number |
Value
n/a
See Also
.F .H .H2 .H3 .H4 .P .T .V
Examples
.H("1st level header",1)
Print out 1st level header (shortened pandoc.header)
Description
Print out 1st level header (shortened pandoc.header)
Usage
.H1(text)
Arguments
text |
text of header |
Value
n/a
See Also
.F .H .H2 .H3 .H4 .P .T .V
Examples
.H1("1st level header")
.H3("3rd level header")
Print out 2nd level header (shortened pandoc.header)
Description
Print out 2nd level header (shortened pandoc.header)
Usage
.H2(text)
Arguments
text |
text of header |
Value
n/a
See Also
.F .H .H2 .H3 .H4 .P .T .V
Examples
.H2("2nd level header")
.H3("3rd level header")
Print out 3rd level header (shortened pandoc.header)
Description
Print out 3rd level header (shortened pandoc.header)
Usage
.H3(text)
Arguments
text |
text of header |
Value
n/a
See Also
.F .H .H2 .H3 .H4 .P .T .V
Examples
.H2("2nd level header")
.H3("3rd level header")
Print out 4th level header (shortened pandoc.header)
Description
Print out 4th level header (shortened pandoc.header)
Usage
.H4(text)
Arguments
text |
text of header |
Value
n/a
See Also
.F .H .H2 .H3 .H4 .P .T .V
Examples
.H2("2nd level header")
.H4("4th level header")
Print out 5th level header (shortened pandoc.header)
Description
Print out 5th level header (shortened pandoc.header)
Usage
.H5(text)
Arguments
text |
text of header |
Value
n/a
See Also
.F .H .H2 .H3 .H4 .P .T .V
Examples
.H2("2nd level header")
.H5("5th level header")
Paragraph (customization of pandoc.p)
Description
Paragraph (customization of pandoc.p)
Usage
.P(text = NULL)
Arguments
text |
text of paragraph |
Value
n/a
See Also
.F .H .H2 .H3 .H4 .P .T .V
Examples
.P()
Print out table title (customization of pandoc.emphasis and pandoc.strong )
Description
Print out table title (customization of pandoc.emphasis and pandoc.strong )
Usage
.T(text, n = NULL, t = "e")
Arguments
text |
text of table title |
n |
table number |
t |
emphasis or stong |
Value
n/a
See Also
.F .H .H2 .H3 .H4 .P .T .V
Examples
text<-"Hello World!"
.T(text)
.T(text, 4)
.T(text, 4,'e')
.T(text, 4,'s')
Print out text (blended pandoc.emphasis, .verbatim, and .strong)
Description
Print out text (blended pandoc.emphasis, .verbatim, and .strong)
Usage
.V(text = " ", t = "v")
Arguments
text |
text |
t |
emphasis or stong or verbatim |
Value
n/a
See Also
.F .H .H2 .H3 .H4 .P .T .V
Examples
.V("Hello World!",'v')
.V("Hello World!",'e')
.V("Hello World!",'s')
.V("Hello World!")
Appends date features to data frame
Description
Appends date features to data frame. Creates new column 'date' based on var if date is not already in the data frame. The newly created (or existing) date column is truncated to day. Columns for year, day of year (doy), decimal year (dyear), and month are added based on date. This function relies on smwrBase::baseDay and smwrBase::baseDay2decimal for doy and decimal year.
The baseDay and baseDay2decimal functions have been added to this package from smwrBase package.
Usage
.appendDateFeatures(df, var = "date")
Arguments
df |
data frame |
var |
variable with stored date |
Value
Returns data frame with appended date features including year, doy, decimal year, and month
Check Data Range – function that checks for allowable values
Description
Check Data Range – function that checks for allowable values.
Usage
.checkRange(df, var, varScrn = NULL, numNA = FALSE, deleteOption = "pass")
Arguments
df |
Data frame with data to check |
var |
Variable to perform screen check on |
varScrn |
Range to check (see examples) |
numNA |
How to treat missing numeric values (TRUE: treat as pass, FALSE[default]: treat as fail) |
deleteOption |
Option for how to return df ("pass": return rows that pass check, "fail": return rows that fail check, "mark": return column with TRUE/FALSE for pass/fail) |
Value
data frame modified based on user selected options. see attributes for screening results
Examples
# create an example data frame
df <- data.frame(
x1 = c("X1","Y2","A1","B2","C1", "X1","","A1","","C1"),
x2 = seq(5, 14 ) + runif(10) ,
x3 = as.POSIXct(c("1/10/2008", "1/21/2008", "3/1/2008", "3/26/1993",
"11/1/2012", "6/10/2000", "8/2/1990", "7/8/2005",
"1/6/2008", "9/11/2008"),
format="%m/%d/%Y"), stringsAsFactors =FALSE)
# add a few missing values
df[1,1]=NA
df[3,2]=NA
df[5,3]=NA
df
# establish allowable values for screening
x1Scrn <- as.character(c("A1", "B2", "C1", "Y2")) # character
x2Scrn <- c(7,13) # min/max value
x3Scrn <- as.POSIXct(c("1999-01-01", "2008-09-10")) # min/max date
# (POSIXct format)
# return df with new column indicating pass [TRUE] / fail [FALSE]
.checkRange(df, var="x1", varScrn=x1Scrn, numNA=FALSE, deleteOption='mark')
.checkRange(df, var="x2", varScrn=x2Scrn, numNA=FALSE, deleteOption='mark')
.checkRange(df, var="x3", varScrn=x3Scrn, numNA=FALSE, deleteOption='mark')
# return df with only rows that pass check
.checkRange(df, var="x1", varScrn=x1Scrn, numNA=FALSE, deleteOption='pass')
.checkRange(df, var="x2", varScrn=x2Scrn, numNA=FALSE, deleteOption='pass')
.checkRange(df, var="x3", varScrn=x3Scrn, numNA=FALSE, deleteOption='pass')
Reduce dataframe and parameter list based on user selected parameterFilt
Description
Reduce dataframe and parameter list based on user selected parameterFilt
Usage
.chkParameter(df, parameterFilt = parameterFilt, parameterList)
Arguments
df |
data frame |
parameterFilt |
parameter filter |
parameterList |
parameter list |
Value
n/a
Examples
#df <- chkParameter(df,parameterFilt=c("tn", "tp"))
Find Recent File Information
Description
Find recent file information based on folder and file name (allows for wildcard structure)
Usage
.findFile(folder = ".", file = "*.*", n = 1, fileNameOnly = TRUE)
Arguments
folder |
folder (i.e., directory to look in, can use relative path ) |
file |
file (can use wildcards, e.g., "*.csv") |
n |
number of files to return (e.g., value of 1 returns most recent file, value of 'all' returns all files) |
fileNameOnly |
logical field, if TRUE only return file name |
Details
This function is used to search a selected directory for information about the most recently modified files in that directory. The default setting searches the current working directory. Relative directory addresses can be used. The default settings returns the name of the most recently modified file. Employing wildcards in the file argument can narrow the file search, e.g., "*.csv" will only return comma delimited files.
The default setting for the argument, n (of 1), will only return a single file. This value can be increased to any number (2, 3,...) to change the maximum number of files returned; or the argument can be set to 'all' to return all files. Setting the argument, fileNameOnly, to FALSE will result in returning additional file meta data related to file size, modified date/time and create date/time.
The results are in descending order of modified date/time.
Value
returns file name as a character string
Examples
# name of most recently modified file
## Not run:
.findFile() # current directory
.findFile("..") # one directory up
#
# list of files and common attributes one directory up
.findFile(folder="..", file="*.*", n=2, fileNameOnly=FALSE)
#two most recent files
.findFile(folder="..", file="*.*", n="all", fileNameOnly=FALSE) #all files
## End(Not run)
Format pvalues
Description
Format pvalues
Usage
.fmtPval(pval)
Arguments
pval |
pvalue to format |
Prepare ANOVA table for GAM analysis
Description
Prepare ANOVA table for GAM analysis
Usage
.gamANOVA(gamo)
Arguments
gamo |
output from gam model |
Prepare table of coefficients for GAM analysis
Description
Prepare table of coefficients for GAM analysis
Usage
.gamCoeff(lmo, iSpec)
Arguments
lmo |
output from gam model |
iSpec |
data frame with intervenList |
Compute and present report on percent different for log-transformed data
Description
Compute and present report on percent different for log-transformed data
Usage
.gamDiffPORtbl(por.diff, iSpec)
Arguments
por.diff |
Output from gam.por.diff |
iSpec |
data set specifications |
plots data and gam fit vs. time
Description
plots data and gam fit vs. time
Usage
.gamPlotCalc(
dep,
tsdat,
pgam,
iSpec,
analySpec,
t.deriv = FALSE,
alpha = 0.05,
dayStep = 10,
step.pt = "none",
q.doy,
flow.detrended = flow.detrended,
salinity.detrended = salinity.detrended
)
Arguments
dep |
variable dep |
tsdat |
variable tsdat |
pgam |
variable pgam |
iSpec |
variable iSpec |
analySpec |
variable analySpec |
t.deriv |
variable t.deriv |
alpha |
variable alpha |
dayStep |
variable dayStep |
step.pt |
variable step.pt |
q.doy |
vector of days of year |
flow.detrended |
data generated by detrended.flow. Default = flow.detrended. |
salinity.detrended |
data generated by detrended.salinity. Default = salinity.detrended. |
#### Initialize stat.gam.result and chng.gam.result
Description
#### Initialize stat.gam.result and chng.gam.result
Usage
.initializeResults()
merge flow variable into analysis data frame and update iSpec with variable name
Description
merge flow variable into analysis data frame and update iSpec with variable name
Usage
.mergeFlow(
ct1 = ct1,
iSpec = iSpec,
gageID = gageID,
hydro.var = hydro.var,
flow.detrended = flow.detrended
)
Arguments
ct1 |
analysis data frame |
iSpec |
iSpec |
gageID |
"q" + USGS gage ID |
hydro.var |
averaging windows |
flow.detrended |
data generated by detrended.flow. Default = flow.detrended. |
merge salinity into analysis data frame and update iSpec with variable name
Description
merge salinity into analysis data frame and update iSpec with variable name
Usage
.mergeSalinity(
ct1 = ct1,
iSpec = iSpec,
salinity.detrended = salinity.detrended
)
Arguments
ct1 |
analysis data frame |
iSpec |
iSpec |
salinity.detrended |
data generated by detrended.salinity. Default = detrended.salinity. |
Re-attribute df based on previous df
Description
Re-attribute df based on previous df. This is useful if you run a drop column command run an aggregate function. This types of functions drop the attributes. This function examines to original and new dfs and adds the attributes from the original df to the new df whenever the new df doesn't have a particular attribute. (This navigates around the issue of changed structure.)
Usage
.reAttDF(df1, df0)
Arguments
df1 |
new data frame |
df0 |
old data frame |
Value
n/a
Examples
# create data frame
df0 <- data.frame (sta=c("A","A"), lay=c("B","C"), x1 =c(NA,2), x2 =c( 4,14))
#add simple attribute
attr(df0, "Attribute1") <- "Test attribute1"
#run aggregate -- loose attributes
df1 <- aggregate(x2 ~ sta, data=df0, mean, na.action=na.pass, na.rm=TRUE)
df2 <- .reAttDF(df1, df0)
Print out character vector table in wrapped mode
Description
Print out character vector table in wrapped mode
Usage
.vTable(x = c(" "), width = 65)
Arguments
x |
character vector |
width |
wrap width [default=65] |
Analysis Organization & Data Preparation
Description
This function assesses the user supplied specifications in the argument, analySpec, and prepares the data (argument df) for analysis. In those cases where the user doesn't supply a needed specification, a basic option is supplied by this function.
Usage
analysisOrganizeData(
df,
analySpec = list(),
reports = c(0, 1, 2, 3, 4),
parameterList = NA,
stationMasterList = NA,
layerLukup = NA
)
Arguments
df |
Data frame of water quality data |
analySpec |
Specifications for analysis |
reports |
Optional reports about parameters, layers and stations [default = c(0,1,2,3)] |
parameterList |
User-supplied dataframe with list of parameters [default = NA] |
stationMasterList |
User-supplied dataframe with list of stations [default = NA] |
layerLukup |
User-supplied dataframe with list of layers [default = NA] |
Details
The supplied data frame, df, is a data frame with the variables station, date, and layer included along with multiple additional columns for a variety of water quality variables structured as numeric fields or survival::Surv objects. An example data frame, dataCensored, is included with baytrends as an example.
The argument, analySpec, is a list that includes basic specifications for performing GAM analyses. The components in analySpec are identified below. The user may create analySpec (which can include all or some of the below components; and pass the user-supplied analySpec to this function. Or, the user may accept the default argument and allow analysisOrganizeData to create and return analySpec. If the user passes a user-supplied analySpec, then analysisOrganizeData will "fill in" required arguments not provided by the user. The user can also adjust analySpec after it is returned from analysisOrganizeData although requirements for down selecting the data frame, df, or aggregating data by layer would need to be passed to analysisOrganizeData.
The default setting for the argument report is to provide tabular summary reports about the parameters, stations, and layers to be analyzed. Setting report=NA will suppress the tabular summary reports.
The user can supply their own parameterList, stationMasterList, or layerLukup data sets; or the user can use the example data frames included with baytrends.
The following steps are performed by analysisOrganizeData:
1) Review user supplied specifications through the analySpec argument. Fill in with default values. For example, if the user doesn't specify a dataframe with a list of stations, parameters, or layers, then the built-in data frames, baytrends::stationMasterList, baytrends::parameterList, and baytrends::layerLukup are used. Some other default values include the following: date range (1/1/1984-present), parameter list (all parameters in data set parameterList), layers (all layers in data set layerLukup), layer aggregation option (0, no aggregation), minimum number of observations to perform a GAM analysis (60), GAM formulas (Linear Trend with Seasonality, Non-linear Trend with Seasonality, and Non-linear trend with Seasonality (plus Interactions)), GAM alpha level for plots and output (0.05), periods for multi-time period analyses using GAM (Full Record, 1999/00-Present, and 2005/06-Present), and seasons for sub-annual analyses using GAM (All, Spring1, Spring2, Summer1, Summer2, SAV1, SAV2, Winter, and Fall.)
2) Based on the settings in analySpec, the data frame df, is down selected based on parameters, stations, dates, and layers. A dependent variable list (depVarList) is created that includes variable descriptions, units, and log transformation selections. A station list (stationList) is created that includes the station ID, a selected USGS gage for correlating flow, and latitude/longitude.
3) Aggregate data layers. If analySpec$layerAggOption is equal to 0, then there is no aggregation. The analySpec$layerAggOption of 1 would result in averaging (mean) surface and above pycnocline data. In this example, records with layer = "S" and layer = "AP" are relabeled as "SAP". Other layerAggOption values are 2) "B"&"BP"; 3) "S"&"AP" and "B"&"BP"; 4) all layers; and 5) "S"&"B", respectively. A layer list (layerList) is created and returned.
4) Data are then averaged (mean) by date.
5) Date features are added. Columns for year, day of year (doy), decimal year (dyear), and month are added based on date. Note that the doy is based on a 366 day calendar regardless of leap year.
6) Reports on the number of records (0), parameters (1), layers (2) and stations (3) can be controlled with the reports option.
Value
Returns a list. Use dfr[["df"]] and dfr[["analySpec"]] to extract updated data frame and updated (or created) analySpec. analySpec is a list that includes the following components:
analyTitle - Analysis trend title
parameterFilt - Parameter filter used for data down selection
stationFilt - Station filter used for data down selection
dateFilt - Date filter for data down selection (default = c( as.POSIXct('1984-01-01') , as.POSIXct(Sys.time())))
setTZ - time zone (default = "America/New_York")
layerFilt - Layer filter
layerAggOption - Layer averaging option (default = 0). Other options are: 1: combine "S" & "AP" ("SAP"); 2: combine "B" & "BP" ("BBP"); 3: opt 1 & 2 ("SAP", "BBP"); 4: combine all ("ALL")); 5: combine "S" and "B" ("SB")
obsMin - Minimum number of observations required to allow GAM analysis to proceed for a specific station, parameter, and layer combination (default = 60).
obsMinInter - Minimum number of observations required to allow GAM analysis to proceed for a specific intervention (default = 10).
gamAlpha - Alpha level used GAM analyses (default = 0.05).
censorTrim - Values to apply for trimming data due to too much censoring (default = c(0.5, 0.4)). First argument indicates fraction of observations in a year that are allowed to be censored. Second argument is the fraction of years, starting from the beginning of the record, that are allowed to be "flagged" for too much censoring. A minimum of two years must have too much censoring before data are removed. The default settings can be read as no more than 40 percent of the beginning years of data are allowed to have more than 50 percent censoring before the beginning portion of the record is trimmed. For example, years 1 and 3 of data have more than 50 percent censoring then the first three years of data are trimmed. Similarly, for years 1 and 4, then the first four years are removed. If years 1 and 5 have more than 50 percent censoring the data are kept since 2/5 is not greater than 0.4. Changing this setting to, say, c(0.2,0.4) would require that 80
gamModels - model formulations. See baytrends::loadModels() for simplified approach for selecting which built-in models to include
showGamNumOnPlot - Show gam option (i.e., 0-6) on gam plots (TRUE/FALSE)
gamDiffPeriods - list of time periods (years) used for computing changes (differences). The default options are: full record, 1999/00-present, and 2005/06-present.
gamDiffSeasons - list List of seasons used for sub-annual analyses of computing differences. The default options include the following: All (months 1:12), Spring1 (3:5), Spring2 (months: 4:6)), Summer1 (months: 6:9)), Summer2 (months: 7:9)), SAV1 (months: 4:10)), SAV2 (months: 3:5,9:11)), Winter (months: 1:2)), and Fall (months: 10:12))).
gamDiffNumChgYrs - number of years to use in computing differences.
gamPenalty - allow the user to set the mgcv::gam select argument to TRUE, FALSE, or baytrend algorithm default (default = NA). When the default option is specified, then the mgcv::gam select argument is set to FALSE when none of the data are censored; otherwise (when some censoring exists in the data set) the select argument is set to TRUE
gamPenaltyCrit - edf and F-stat values used to flag ANOVA table results (default = c(1, 9e9))
gamCoeffDeltaMaxCrit - convergence criteria for expectation maximization (default = 1e-6)
gamFlw_Sal.Wgt.Perc - percentiles of flow [or salinity] to use for computing flow [salinity] averaged result (default = c(0.05, 0.25, 0.50, 0.75, 0.95))
gamLegend - default settings for gam figure legend
idVar - primary key for data frame returned as df
depVarList - data frame of dependent variables (useful for setting up gam analyses in for loops)
stationList - data frame of stations (useful for setting up gam analyses in for loops)
layerList - data frame of layers (useful for setting up gam analyses in for loops)
Examples
# run analysis relying on default specifications, examine analySpec for
# default options
dfr <- analysisOrganizeData(dataCensored)
df <- dfr[["df"]]
analySpec <- dfr[["analySpec"]]
# analyze bottom dissolved oxygen at 2 stations using only data from 1/1/1995-12/31/2015
analySpec <-list()
analySpec$parameterFilt <- c('do')
analySpec$layerFilt <- c('B')
analySpec$stationFilt <- c('CB3.3C', 'CB5.4')
analySpec$dateFilt <- as.POSIXct(c("1995-01-01", "2015-12-31"))
dfr <- analysisOrganizeData(dataCensored, analySpec)
df <- dfr[["df"]]
analySpec <- dfr[["analySpec"]]
Append Date Features
Description
Appends date features to data frame
Usage
appendDateFeatures(df, var = "date")
Base Day
Description
Computes the base day of the year, a reference value that can be used to group days for the computation of summary statistics. From smwrBase package.
Usage
baseDay(x, numeric = TRUE, year = c("calendar", "water", "climate"))
Arguments
x |
a vector of class POSIXt, Dates, or character that represents a date. Missing values are permitted. |
numeric |
a logical value; |
year |
a character string indicating the basis of the factor levels. See Details. |
Details
The base day is computed such that all dates have the same reference value
regardless of whether the year is a leap year or not. If year
is
"calendar," then the factor levels or day number begin on January 1; if year
is
"water," then the factor levels or day number begin on October 1; and if year
is
"climate," then the factor levels or day number begin on April 1.
Value
An integer value representing the base day number if numeric
is TRUE
. Otherwise a factor with levels for every day of the year.
Examples
# The default numeric result
baseDay(c("2000-02-29", "2000-03-01", "2001-03-01"))
# The result as a factor
baseDay(c("2000-02-29", "2000-03-01", "2001-03-01"), numeric=FALSE)
Base Day
Description
Computes the decimal time representation of the base day of the year. From smwrBase package.
Usage
baseDay2decimal(x)
Arguments
x |
a vector of baseDay values, character, or factors of the form month
abbreviation and day number, generally created from |
Value
A numeric value representing the base day.
See Also
Examples
# The baseDay ordered by calendar year
bd.tmp <- baseDay(c("2000-02-29", "2000-03-01", "2001-03-01"),
numeric=FALSE)
baseDay2decimal(bd.tmp)
# ordered by water year, result should agree
bd.tmp <- baseDay(c("2000-02-29", "2000-03-01", "2001-03-01"),
numeric=FALSE, year="water")
baseDay2decimal(bd.tmp)
Document Processing Time and Other Session Time
Description
Document Processing Time and Other Session Time
Usage
closeOut(timeProcess = TRUE, contInfo = TRUE, sessInfo = TRUE)
Arguments
timeProcess |
Processing time |
contInfo |
Contact information |
sessInfo |
Session information |
Value
Reports out processing time, contact information, and session information
Examples
closeOut()
Calculate GAM residuals
Description
Use GAM analysis to compute residuals. Relies on mgcv::gam to perform general additive model.
Usage
createResiduals(
df,
dep,
residualModel = "doy_flw_sal",
analySpec = analySpec,
gamTable = FALSE,
gamPlot = FALSE,
flow.detrended = NA,
salinity.detrended = NA,
width = 10,
height = 3.5,
folder_r = "pltResiduals",
ProjRoot
)
Arguments
df |
data frame |
dep |
variable |
residualModel |
which gam formula is used to compute. Default = 'doy_flw_sal' |
analySpec |
analytical specifications |
gamTable |
gam table setting (set to FALSE to turn off table output) Default = FALSE |
gamPlot |
gam plot setting (set to FALSE to turn off plotting) Default = FALSE |
flow.detrended |
data generated by detrended.flow. Default = NA |
salinity.detrended |
data generated by detrended.flow. Default = NA |
width |
width of png figure (inches). Default = 10 |
height |
height of png figure (inches). Default = 3.5 |
folder_r |
folder to store residual plots |
ProjRoot |
Root folder for project. |
Value
Returns df with appended column of data
Chesapeake Bay Program Monitoring Data, 1985-2016
Description
Selected 1985-2016 data for eight (8) stations from the Chesapeake Bay Monitoring Program. Water quality variables are stored as either class 'num' for variables with no censoring or class 'Surv' (see survival::Surv) that allows for left- and interval-censored data.
Usage
dataCensored
Format
A data frame with 13,062 rows and 17 variables:
- station
station identifier
- date
sample date
- layer
sample layer
- secchi
Secchi depth [m]
- salinity
Salinity [ppt]
- do
Dissolved Oxygen [mg/L]
- wtemp
Water Temperature [deg C]
- chla
Corrected Chlorophyll-a [ug/L]
- tn
Total Nitrogen [mg/L]
- tp
Total Phosphorus [mg/L]
- tss
Total Suspended Solids [mg/L]
- din
Dissolved Inorganic Nitrogen [mg/L]
- po4
Orthophosphorus [mg/L]
- tdn
Total Dissolved Nitrogen [mg/L]
- tdp
Total Dissolved Phosphorus [mg/L]
- nh4
Ammonium [mg/L]
- no23
Nitrite + Nitrate [mg/L]
Decimal Time
Description
Convert date/time data to be expressed as year and fractional part of year. This can be useful for plotting or representing time in a regression model.
Usage
dectime(
dates,
times,
time.format,
date.format,
Date.noon = TRUE,
year.type = c("calendar", "water", "climate")
)
Arguments
dates |
a vector of a valid date object, or character representation of dates. Missing values are permitted and produce corresponding missing values in the output. |
times |
a character representation of times. Missing values are permitted and produce corresponding missing values in the output. |
time.format |
format to convert |
date.format |
format to convert |
Date.noon |
logical, if |
year.type |
a character string indicating the type of year to determine the offset, must be one of "calendar," "water," or "climate." |
Details
The format for times
must be one of "hm," "hms," or "ms." Note that
this is actually a conversion function, see See Also. If times
is missing, dates
is class "Date," and Date.noon
is TRUE
,
then set the time to 12:00, so that the decimal time represents the center of
the day.
Added from smwrBase.
Value
A vector representation of the data in decimal format–year and decimal fraction.
Examples
dectime("11/11/1918", date.format="%m/%d/%Y")
dectime(1988:1990)
Date Conversion
Description
Convert time data expressed as year and fractional part of year to class "Date."
Usage
dectime2Date(x, Date.noon = TRUE)
Arguments
x |
the decimal date to convert. |
Date.noon |
correct from noon correction for |
Details
Added from smwrBase.
Value
A vector of class "Date" corresponding to each value in x
.
Note
A small value, representing about 1 minute, is added to each value in x
to prevent truncation errors in the conversion. This can cause some errors if
the data were converted from date and time data.
See Also
Examples
dectime("02/07/2013", date.format="%m/%d/%Y")
# Convert back the printed result:
dectime2Date(2013.103)
Create Seasonally Detrended Flow Data Set
Description
This function creates a seasonally detrended flow data set for selected USGS gages. The created data set is used to support application of GAMs that include a hydrologic term as one of the independent variables. The output from this function should be stored as an .rda file for repeated use with baytrends.
Usage
detrended.flow(
usgsGageID,
siteName,
yearStart,
yearEnd,
dvAvgWinSel = c(1, 5, 10, 15, 20, 30, 40, 50, 60, 90, 120, 150, 180, 210),
dvAvgWgtSel = "uniform",
dvAvgSidesSel = 1,
lowess.f = 0.2,
span = 10,
max.fill = 10
)
Arguments
usgsGageID |
USGS GageIDs (e.g., "01491000") |
siteName |
USGS SiteName (only used for plots) |
yearStart |
start year (recommended as at least one year before corresponding water quality data set) |
yearEnd |
end year |
dvAvgWinSel |
Averaging window (days) for smoothing the residuals of the seasonally adjusted daily flow values [default = c(1, 5, 10, 15, 20, 30, 40, 50, 60, 90, 120, 150, 180, 210)] |
dvAvgWgtSel |
Averaging method ("uniform", "weighted", or "centered") for creating weights. If using "weighted" then use dvAvgSidesSel=1. If using "centered" then use dvAvgSidesSel=2. [default = "uniform"] |
dvAvgSidesSel |
If dvAvgSidesSel=1 only past values are used, if dvAvgSidesSel=2 then values are centered around lag 0. [default = 1] |
lowess.f |
lowess smoother span applied to computed standard deviation (see Details). This gives the proportion of points which influence the smooth at each value. Larger values give more smoothness. [default = 0.2] |
span |
maximum number of observations on each side of range of missing values to use in filling in data [default = 10] |
max.fill |
maximum gap to fill in [default = 10] |
Details
This function returns a list of seasonally detrended flow and companion statistics; and relies on USGS' dataRetrieval package to retrieve daily flow data.
It is the user responsibility to save the resulting list as flow.detrended for integration with baytrends.
For the purposes of baytrends, it is expected that the user would identify all USGS gages that are expected to be evaluated so that a single data file is created. To best match up with water quality data, we recommend retrieving flow data for one year prior to the first year of water quality data. This allows for creating a time-averaged flow data set and not loose the first few months of water quality data due to lack of matching flow data. Data retrievals should also be made in light of the time needed by the USGS to review and approve their flow records.
After retrieval, the following computation steps are performed to create a data frame for each USGS gage (the data frame naming convention is qNNNNNNNN where NNNNNNNN is the USGS gage ID):
1) The daily flow data are converted to cubic meters per second [cms] and stored as the variable q.
2) The day of year (doy) is added to the data set. We use a 366 day calendar regardless of leap year.
3) The Log (ln) flow is computed and stored as LogQ.
4) A seasonal GAM, i.e., gamoutput <- gam(LogQ ~ s(doy, bs='cc')) is evaluated and the predicted values stored as qNNNNNNNN.gam.
5) The GAM residuals, i.e., "residuals(gamoutput)" are extracted and stored as the variable, d1.
6) Based on the specifications for dvAvgWinSel, dvAvgWgtSel, and dvAvgSidesSel, the values of d1 are time averaged and additional variables dxxx are added to the data frame where xxx corresponds to list of averaging windows specified in dvAvgWinSel. These values of dxxx are used in GAMs that include a hydrologic independent variable.
After the above data frame is created, the following four (4) additional data frames are created for each USGS gage and combined into a list named qNNNNNNNN.sum:
mean – For each doy (i.e., 366 days of year), the mean across all years for each value of d in the above data frame, qNNNNNNNN.
sd – For each doy (i.e., 366 days of year), the standard deviation across all years for each value of d in the above data frame, qNNNNNNNN.
nobs – For each doy (i.e., 366 days of year), the number of observations across all years for each value of d in the above data frame , qNNNNNNNN.
lowess.sd – Lowess smoothed standard deviations. (These values are used for computing confidence intervals in the flow averaged GAM.)
The process of creating the above data frame, qNNNNNNNN, and list, qNNNNNNNN.sum, is repeated for each USGS gage and combined together in a single list. The beginning of the list includes meta data documenting the retrieval parameters.
This function can be used in conjunction with an RMD file to knit (create) a report (DOCX or HTML).
Value
Returns a list of seasonally detrended flow data. You should save the resulting list as flow.detrended for use with baytrends. This function also creates diagnostic plots that can be saved to a report when this function is called from an .Rmd script.
Examples
## Not run:
# Define Function Inputs
usgsGageID <- c("01491000", "01578310")
siteName <- c("Choptank River near Greensboro, MD",
"Susquehanna River at Conowingo, MD")
yearStart <- 1983
yearEnd <- 2016
dvAvgWinSel <- c(1, 5, 10, 15, 20, 30, 40, 50, 60, 90, 120, 150, 180, 210)
dvAvgWgtSel <- "uniform"
dvAvgSidesSel <- 1
lowess.f <- 0.2
# Run Function
flow.detrended <- detrended.flow(usgsGageID, siteName, yearStart, yearEnd
, dvAvgWinSel, dvAvgWgtSel, dvAvgSidesSel
, lowess.f)
## End(Not run)
Create Seasonally Detrended Salinty Data Set
Description
This function creates a seasonally detrended salinity data set for selected stations. The created data set is used to support application of GAMs that include a hydrologic term as one of the independent variables. The output from this function should be stored as an .rda file for repeated use with baytrends.
Usage
detrended.salinity(
df.sal,
dvAvgWinSel = 30,
lowess.f = 0.2,
minObs = 40,
minObs.sd = 10
)
Arguments
df.sal |
data frame with salinty data (required variables in data frame are: station, date, layer, and salinity) |
dvAvgWinSel |
Averaging window (days) selection for pooling data to compute summary statistics |
lowess.f |
lowess smoother span applied to computed standard deviation (see Details). This gives the proportion of points which influence the smooth at each value. Larger values give more smoothness. |
minObs |
Minimum number of observations for performing analysis (default is 40) |
minObs.sd |
Minimum number of observations in averaging window for calculation of the standard deviation (default is 10) |
Details
This function returns a list of seasonally detrended salinity and companion statistics; and relies on a user supplied data frame that contains the following variables: station, date, layer, and salinity. See structure of sal data in example below.
It is the user responsibility to save the resulting list as salinity.detrended for integration with baytrends.
For the purposes of baytrends, it is expected that the user would identify a data set with all salinity data that are expected to be evaluated so that a single data file is created. The following computation steps are performed:
1) Extract the list of stations, minimum year, and maximum year in data set. Initialize the salinity.detrended list with this information along with meta data documenting the retrieval parameters.
2) Downselect the input data frame to only include data where the layer is equal to 'S', 'AP', 'BP' or 'B'.
3) Average the 'S' and 'AP' salinity data; and the 'B' and 'BP salinity data together to create average salinity values for SAP (surface and above pycnocline) and BBP (bottom and below pycnocline), respectively. These values are stored as the variables, salinity.SAP and salinity.BBP together with the date and day of year (doy) in a data frame corresponding to the station ID.
4) For each station/layer combination with atleast minObs observations, a seasonal GAM, i.e., gamoutput <- gam(salinity ~ s(doy, bs='cc')) is evaluated and the predicted values stored in the above data frame as salinity.SAP.gam and salinity.BBP.gam.
5) The GAM residuals, i.e., "residuals(gamoutput)" are extracted and stored as the variable, SAP or BBP in the above data frame. (These are the values that are used for GAMs that include salinity.)
6) After the above data frame is created and appended to the list salinity.detrended, the following four (4) additional data frames are created for each station.
mean – For each doy (i.e., 366 days of year), the mean across all years for each value of d. Since samples are not collected on a daily basis it is necessary to aggregate data from within a +/- one-half of dvAvgWinSel-day window around d. (This includes wrapping around the calendar year. That is, the values near the beginning of the year, say January 2, would include values from the last part of December and the first part of January. The variables in the mean data frame are doy, SAP, and BBP.
sd – For each doy (i.e., 366 days of year), the standard deviation across all years for each value of d. (See mean calculations for additional details.)
nobs – For each doy (i.e., 366 days of year), the number of observations across all years for each value of d. (See mean calculations for additional details.)
lowess.sd – Lowess smoothed standard deviations. It is noted that some stations do not include regular sampling in all months of the year or for other reasons have few observations from which to compute standard deviations. Through visual inspection of plots, we found that the standard deviation could become unstable when the number of observations is small. For this reason, when the number of observations is less than minObs.sd, the corresponding value of lowess.sd is removed and interpolated from the remaining observations.
The above four data frames (mean, sd, nobs, and lowess.sd) are created, they are added to a list using a station.sum naming convention and appended to the list salinity.detrended.
Value
Returns a list of seasonally detrended salinity data. You should save the resulting list as salinity.detrended for use with baytrends. This function also creates diagnostic plots that can be saved to a report when this function is called from an .Rmd script.
Examples
## Not run:
# Show Example Dataset (sal)
str(sal)
# Define Function Inputs
df.sal <- sal
dvAvgWinSel <- 30
lowess.f <- 0.2
minObs <- 40
minObs.sd <- 10
# Run Function
salinity.detrended <- detrended.salinity(df.sal, dvAvgWinSel,
lowess.f, minObs, minObs.sd)
## End(Not run)
Event Processing
Description
Computes the event number eventNum
, the length of events
eventLen
, or the sequence number for individual observations within an
event eventSeq
. Added from smwrBase.
Usage
eventNum(event, reset = FALSE, na.fix = FALSE)
eventSeq(eventno)
eventLen(eventno, summary = FALSE)
Arguments
event |
a logical vector where |
reset |
a logical value indicating whether the event is assumed to
continue until the next event, or only while event is |
na.fix |
the value to use where event has missing values ( |
eventno |
an integer vector indicating the event number. Generally the
output from the |
summary |
a logical value, controlling output. See Value for details. |
Value
The function eventNum
returns an integer vector the same
length as event
indicating the event sequence number.
The function eventLen
returns an integer vector the same length as
eventno
indicating the sequence length of the event if summary
is FALSE
, or a named integer vector indicating the sequence length of
each event if summary
is TRUE
.
The function eventSeq
returns an integer vector the same length as
eventno
indicating the sequence number of each element in the
event.
Examples
## Notice the difference caused by setting reset to TRUE
eventNum(c(TRUE,TRUE,FALSE,FALSE,TRUE,FALSE))
eventNum(c(TRUE,TRUE,FALSE,FALSE,TRUE,FALSE), reset=TRUE)
## Notice the difference caused by setting reset to TRUE
eventSeq(eventNum(c(TRUE,TRUE,FALSE,FALSE,TRUE,FALSE)))
eventSeq(eventNum(c(TRUE,TRUE,FALSE,FALSE,TRUE,FALSE), reset=TRUE))
## Notice the difference caused by setting reset to TRUE
eventLen(eventNum(c(TRUE,TRUE,FALSE,FALSE,TRUE,FALSE), reset=TRUE))
## This is an example of the summary option
eventLen(eventNum(c(TRUE,TRUE,FALSE,FALSE,TRUE,FALSE), reset=TRUE), summary=TRUE)
Fill Missing Values
Description
Replace missing values in time-series data by interpolation.
Usage
fillMissing(x, span = 10, Dates = NULL, max.fill = 10)
Arguments
x |
the sequence of observations. Missing values are permitted and will be replaced. |
span |
the maximum number of observations on each side of each range of missing values to use in constructing the time-series model. See Details. |
Dates |
an optional vector of dates/times associated with each value
in |
max.fill |
the maximum gap to fill. |
Details
Missing values at the beginning and end of x
will not be replaced.
The argument span
is used to help set the range of values used to
construct the StructTS
model. If span
is set small, then the
variance of epsilon dominates and the estimates are not smooth. If
span
is large, then the variance of level dominates and the estimates
are linear interpolations. The variances of level and epsilon are components
of the state-space model used to interpolate values, see StructTS
for details.
See Note for more information about the method.
If span
is set larger than 99, then the entire time series is used to
estimate all missing values. This approach may be useful if there are many
periods of missing values. If span
is set to any number less than 4,
then simple linear interpolation will be used to replace missing values.
Added from smwrBase.
Value
The observations in x
with missing values replaced by
interpolation.
Note
The method used to interpolate missing values is based on
tsSmooth
constructed using StructTS
on x
with
type
set to "trend." The smoothing method basically uses the
information (slope) from two values previous to missing values and the two
values following missing values to smoothly interpolate values accounting for
any change in slope. Beauchamp (1989) used time-series methods for synthesizing
missing streamflow records. The group that is used to define the statistics that
control the interpolation is very simply defined by span
rather than
the more in-depth measures described in Elshorbagy and others (2000).
If the data have gaps rather than missing values, then fillMissing will return
a vector longer than x
if Dates
is given and the return data
cannot be inserted into the original data frame. If Dates
is not given,
then the gap will be recognized and not be filled. The function
insertMissing
can be used to create a data frame with the complete
sequence of dates.
References
Beauchamp, J.J., 1989, Comparison of regression and time-series
methods for synthesizing missing streamflow records: Water Resources
Bulletin, v. 25, no. 5, p. 961–975.
Elshorbagy, A.A., Panu, U.S., and Simonovic, S.P., 2000, Group-based estimation of missing hydrological data, I. Approach and general methodology: Hydrological Sciences Journal, v. 45, no. 6, p. 849–866.
See Also
Examples
## Not run:
#library(smwrData)
data(Q05078470)
# Create missing values in flow, the first sequence is a peak and the second is a recession
Q05078470$FlowMiss <- Q05078470$FLOW
Q05078470$FlowMiss[c(109:111, 198:201)] <- NA
# Interpolate the missing values
Q05078470$FlowFill <- fillMissing(Q05078470$FlowMiss)
# How did we do (line is actual, points are filled values)?
par(mfrow=c(2,1), mar=c(5.1, 4.1, 1.1, 1.1))
with(Q05078470[100:120, ], plot(DATES, FLOW, type="l"))
with(Q05078470[109:111, ], points(DATES, FlowFill))
with(Q05078470[190:210, ], plot(DATES, FLOW, type="l"))
with(Q05078470[198:201, ], points(DATES, FlowFill))
## End(Not run)
Create filter weights
Description
Create filter weights. Typically used to compute even, weighted, or center-weighted averages for smoothing. Can be used as a vector of weights in stats::filter
Usage
filterWgts(n = NULL, type = "weighted")
Arguments
n |
number of values |
type |
Averaging method ("uniform" [even], "weighted" [default], or "centered") for creating weights. |
Value
Returns vector of weights
Examples
wgts<- filterWgts(0,"uniform")
wgts<- filterWgts(7,"uniform")
wgts<- filterWgts(7,"centered")
wgts<- filterWgts(7,"weighted")
x <- 1:100
filter(x, filterWgts(7,"weighted"), sides=1)
Flow Averaged Predictions
Description
Compute flow[salinity] weighted predictions
Usage
flwAveragePred(pdat, pgam, normPct)
Arguments
pdat |
prediction data set |
pgam |
gam model |
normPct |
vector of percentiles to use in weighting |
Value
data frame of flow[salinity] averaged predictions and standard errors
Compute an estimate of difference based on GAM results
Description
Compute an estimate of difference based on GAM results
Usage
gamDiff(
gamRslt,
iSpec,
analySpec,
base.yr.set = NA,
test.yr.set = NA,
doy.set = NA,
alpha = 0.05,
flow.detrended = NA,
salinity.detrended = NA
)
Arguments
gamRslt |
output from gam model |
iSpec |
data set specifications (see details for required content) |
analySpec |
analytical specifications |
base.yr.set |
vector of years used for baseline period |
test.yr.set |
vector of years used for test period |
doy.set |
vector of days used to establish sub-annual analyses (see details) |
alpha |
alpha level for computing confidence intervals |
flow.detrended |
data generated by detrended.flow. Default = flow.detrended. |
salinity.detrended |
data generated by detrended.salinity. Default = detrended.salinity. |
Details
iSpec is a list containing information about the date range and transformations. Specifically, iSpec must include iSpec$yearBegin, iSpec$yearEnd, iSpec$centerYear corresponding to beginning year of record, ending year of record and centering year. Also, iSpec must include iSpec$transform and iSpec$logConst. (See online help for selectData for more information on these values.)
base.yr.set and test.yr.set represent two time periods used to compare differences. For example, base.yr.set=c(1999,2000) and test.yr.set=c(2013, 2014) would compare GAM predictions from 1999-2000 versus 2013-2014. There is no particular limit to the number of years included in the specification for base.yr.set and test.yr.set. For example, a user could specify c(2001:2002,2004) to use the years 2001, 2002, and 2004, skipping 2003 because 2003 was an abnormal year (particularly wet, particularly dry, hurricanes, etc.).
base.yr.set and test.yr.set must be within the years specified by the range from iSpec$yearBegin to iSpec$yearEnd (inclusive). If not, this function defaults to using the first two years (or last two years) of record. If base.yr.set and test.yr.set are left to their default values of NA, then the first two and last two years will be used
doy.set represents the days of year for which GAM predictions are made and used to compute base.yr and test.yr means. For example doy.set= c(15, 46, 75, 106, 136, 167, 197, 228, 259, 289, 320, 350) would result in the 15th of each month being used in the analysis; whereas doy.set= c(15, 46, 75) would just use Jan-15, Feb-15, and Mar-15. (Keep in mind that this package uses a 366 day calendar every year such that Mar-1 is always day 61, regardless of leap year.) If doy.set is left to the default value of NA, then c(15, 46, 75, 106, 136, 167, 197, 228, 259, 289, 320, 350) is used.
The baseDay function has been added to this package from the smwrBase package.
Value
Returns a nest list that includes the base and test years, doys, period means in analyzed units, period means in observed units, percent change, difference estimate, difference estimate in observed units, standard error, confidence intervals, t statistic, p value, and alpha level. The alpha level corresponds to the confidence intervals. The first list (gamDiff.regular) uses the computed model to estimate differences and is applicable for GAM formulas that do not involve an intervention term. The second list (gamDiff.adjusted) performs computations by projecting the most recent intervention (e.g., the current lab method) to all time periods.
Examples
# run analysisOrganizeData function to create the list analySpec
dfr <- analysisOrganizeData (dataCensored, report=NA)
df <- dfr[["df"]]
analySpec <- dfr[["analySpec"]]
# set GAM models to just one model
analySpec$gamModels <- list(
list(option=2
, name= "Non-linear trend with Seasonality (plus Interactions)"
, model= "~ cyear + s(cyear) + s(doy,bs='cc') +
ti(cyear,doy,bs=c('tp','cc'))"
, deriv=FALSE))
# run GAM for a single water quality variable, station and layer
gamResult <- gamTest(df, 'tn', 'CB5.4', 'S', analySpec=analySpec)
# use gamDiff to replicate estimates of change calculated in the above
gamDiff(gamRslt=gamResult[["gamOutput2"]]$gamRslt,
iSpec=gamResult$iSpec, analySpec=analySpec,
base.yr.set = NA, test.yr.set = NA,
doy.set = NA, alpha = 0.05)
# use gamDiff to calculate changes from 2005/06 to 2013/14
gamDiff(gamRslt=gamResult[["gamOutput2"]]$gamRslt,
iSpec=gamResult$iSpec, analySpec=analySpec,
base.yr.set = c(2004:2005), test.yr.set = c(2013:2014),
doy.set = NA, alpha = 0.05)
Plot censored gam fits vs. time
Description
Plot censored gam fits vs. time
Usage
gamPlotDisp(
gamResult = gamResult,
analySpec = analySpec,
fullModel = 2,
seasAvgModel = 2,
seasonalModel = 2,
diffType = "regular",
obserPlot = TRUE,
interventionPlot = TRUE,
seasAvgPlot = TRUE,
seasAvgConfIntPlot = TRUE,
seasAvgSigPlot = TRUE,
fullModelPlot = TRUE,
seasModelPlot = TRUE,
BaseCurrentMeanPlot = TRUE,
adjustedPlot = FALSE
)
Arguments
gamResult |
output from procedure gamTest |
analySpec |
analytical specifications |
fullModel |
GAM # for displaying full GAM (e.g., 0, 1, 2) |
seasAvgModel |
GAM # for displaying seasonally average GAM |
seasonalModel |
GAM # for displaying seasonal GAM |
diffType |
plot predicted baseline mean ('regular') or adjusted baseline mean ('adjusted') |
obserPlot |
logical field indicating whether to plot observations |
interventionPlot |
logical field indicating whether to plot interventions (e.g., method changes) |
seasAvgPlot |
logical field indicating whether to plot seasonal average GAM |
seasAvgConfIntPlot |
logical field indicating whether to plot confidence interval for seasonal average GAM |
seasAvgSigPlot |
logical field indicating whether to plot significant increasing and decreasing trends for seasonal average GAM |
fullModelPlot |
logical field indicating whether to plot full GAM |
seasModelPlot |
logical field indicating whether to plot seasonal GAM |
BaseCurrentMeanPlot |
logical field indicating whether to plot baseline and current mean |
adjustedPlot |
logical field indicating whether to plot adjusted model |
Plot censored gam fits vs. time
Description
Plot censored gam fits vs. time
Usage
gamPlotDispSeason(
gamResult = gamResult,
analySpec = analySpec,
fullModel = 2,
seasAvgModel = 2,
seasonalModel = 2,
diffType = "regular",
obserPlot = TRUE,
interventionPlot = TRUE,
seasAvgPlot = TRUE,
seasAvgConfIntPlot = TRUE,
seasAvgSigPlot = TRUE,
fullModelPlot = TRUE,
seasModelPlot = TRUE,
BaseCurrentMeanPlot = TRUE,
adjustedPlot = FALSE,
gamSeasonFocus = TRUE
)
Arguments
gamResult |
output from procedure gamTest |
analySpec |
analytical specifications |
fullModel |
GAM # for displaying full GAM (e.g., 0, 1, 2) |
seasAvgModel |
GAM # for displaying seasonally average GAM |
seasonalModel |
GAM # for displaying seasonal GAM |
diffType |
plot predicted baseline mean ('regular') or adjusted baseline mean ('adjusted') |
obserPlot |
logical field indicating whether to plot observations |
interventionPlot |
logical field indicating whether to plot interventions (e.g., method changes) |
seasAvgPlot |
logical field indicating whether to plot seasonal average GAM |
seasAvgConfIntPlot |
logical field indicating whether to plot confidence interval for seasonal average GAM |
seasAvgSigPlot |
logical field indicating whether to plot significant increasing and decreasing trends for seasonal average GAM |
fullModelPlot |
logical field indicating whether to plot full GAM |
seasModelPlot |
logical field indicating whether to plot seasonal GAM |
BaseCurrentMeanPlot |
logical field indicating whether to plot baseline and current mean |
adjustedPlot |
logical field indicating whether to plot adjusted model |
gamSeasonFocus |
logical field indicating whether to plot focus on season mean |
See Also
Examples
## Not run:
# Specify parameter and station to analyze
dep <- 'do'
stat <- 'CB5.4'
layer <- 'B'
# Prepare data and set up specifications for analysis
dfr <- analysisOrganizeData (dataCensored)
df <- dfr[[1]]
analySpec <- dfr[[2]]
# Apply gamTest
gamResult <- gamTest(df, dep, stat, layer, analySpec=analySpec)
gamPlotDisp(gamResult = gamResult, analySpec = analySpec,
fullModel = 2, seasAvgModel = 2, seasonalModel = 2,
diffType = "regular", obserPlot = TRUE, interventionPlot = TRUE,
seasAvgPlot = TRUE, seasAvgConfIntPlot = FALSE,
seasAvgSigPlot = FALSE, fullModelPlot = TRUE, seasModelPlot = TRUE,
BaseCurrentMeanPlot = FALSE, adjustedPlot = FALSE)
# Apply gamTestSeason
gamResult2 <- gamTestSeason(df, dep, stat, layer, analySpec=analySpec,
gamSeasonPlot = c("7/15-8/15", "purple", "range"))
gamPlotDispSeason(gamResult = gamResult2, analySpec = analySpec,
fullModel = 2, seasAvgModel = 2, seasonalModel = 2,
diffType = "regular", obserPlot = TRUE, interventionPlot = TRUE,
seasAvgPlot = TRUE, seasAvgConfIntPlot = FALSE,
seasAvgSigPlot = FALSE, fullModelPlot = FALSE, seasModelPlot = FALSE,
BaseCurrentMeanPlot = TRUE, adjustedPlot = FALSE, gamSeasonFocus = TRUE)
## End(Not run)
Perform GAM analysis
Description
Perform GAM analysis. Relies on mgcv::gam to perform general additive model.
gam
The baseDay function has been added to this package from the smwrBase package.
Usage
gamTest(
df,
dep,
stat,
layer = NA,
analySpec,
gamTable = TRUE,
gamPlot = 10,
gamDiffModel = NA,
flow.detrended = NA,
salinity.detrended = NA
)
Arguments
df |
data frame |
dep |
dependent variable |
stat |
station |
layer |
layer |
analySpec |
analytical specifications |
gamTable |
gam table setting (set to FALSE to turn off table output) |
gamPlot |
gam plot setting (set to FALSE to turn off plotting) |
gamDiffModel |
GAM model(s) used for computing differences on sub-annual/multi-period basis |
flow.detrended |
data generated by detrended.flow. Default = NA. |
salinity.detrended |
data generated by detrended.flow. Default = NA. |
Details
Set gamPlot=FALSE
to turn off plotting. Computing the information
("predictions") to create plots is one of the more time consumings aspects of
the gamTest
function. Setting gamPlot=FALSE
turns off these
computations and speeds up gamTest
. The disadvantage is that no
predictions are returned; however, the tabularized results stored in
stat.gam.result
and, if requested, chng.gam.result
are still
returned.
Setting gamPlot
to a value between 1-30 changes the resolution of the
resulting figure by setting the interval on which the prediction data set is
made. By default gamPlot
is set to 10. That is, a prediction is made
every 10th day, or about 36 predictions per year. Values closer to 1 result
in larger returned prediction data sets and take more computation time.
Values closer to 30 result in smaller returned data sets and take less
computation time. Although there is no change in the fitted model, values
closer to 30 may have slight degraded figure quality if there is subtantial
seasonality in the fitted model since the seasonal minimum and maximum might
not be included in the prediction data set and therefore not plotted. Values
greater than 30 are treated as 30. Setting gamPlot=30
might be
advantangeous when the analysis only requires cursory figure examination.
Setting gamTable=FALSE
will turn off table output to the console. This
may be advantageous to reduce the amount of output. Since these computations
do not significantly affect gamTest
run time, the standard Analysis of
Variance, GAM Parameter Coefficients, Diagnostics, and Estimates of Change
tables are returned from gamTest
regardless of the gamTable
setting. Many of the values from these tables are also returned as part of
tabularized stat.gam.result
.
The default settings for gamDiffModel
(i.e., gamDiffModel=NA
)
will not result in sub-annual (i.e., seasonal) differences being computed. In
this default setting, the returned chng.gam.result
that is returned
from gamTest
will be empty. If gamDiffModel
is a value (i.e.,
not NA
), then chng.gam.result
will include one row for each
combination of years specified in analySpec$gamDiffPeriods
, seasons
specified in analySpec$gamDiffSeason
, and the number of models listed
in gamDiffModel
. For example gamDiffModel=c(0,1,2)
would
result in sub-annual being computed for gam0, gam1, and gam2.
Flow and Salinity Adjustments (gam4). It is necessary to create and
pass properly formatted data via the flow.detrended
and
salinity.detrended
arguments to evaluate gam4 models. See
detrended.flow
and detrended.salinity
for more
information on how to create properly formatted data.
Value
Returns a list with results
-
stat.gam.result – Data frame with summary results in a tabular format suitable for output to a csv file
-
chng.gam.result – Data frame with estimate of change for sub-annual differences based on the settings specified in
gamDiffModel
. Includes all data from stat.gam.result and sub-annual estimates of change in a tabular format suitable for output to a csv file. -
data – Data frame of data used in analysis (for example years with large proportion of censored data might be removed from the analysis and not included in this data frame)
-
data.all – Data frame of all potential data available for analysis
-
iSpec – List of baytrends analysis settings used for a specific analysis of station and parameter
-
gamOutput0 – Results including model predictions for gam formula 0 (i.e., gam0)
-
gamOutput1 – Results including model predictions for gam formula 1 (i.e., gam1)
-
gamOutput2 – Results including model predictions for gam formula 2 (i.e., gam2)
-
gamOutput3 – Results including model predictions for gam formula 3 (i.e., gam3)
-
gamOutput4 – Results including model predictions for gam formula 4 (i.e., gam4)
-
gamOutput5 – Results including model predictions for gam formula 5 (i.e., gam5)
-
gamOutput6 – Results including model predictions for gam formula 6 (i.e., gam6)
gamOutput* – For each evaluated model, gamOutput*
(see above
element list) is a list with the following elements:
-
gamOption – gam formula ID, i.e., 0, 1, 2, 3, 4 , 5, 6 corresponding to gam0, gam1, gam2, etc.
-
gamRslt – mgcv::gam output
-
gamRsltSum – "summary" of mgcv::gam, i.e., summary(mgcv::gam)
-
gamANOVAtbl – GAM Analysis of Variance table
-
gamCoefftbl – GAM Parameter Coefficients table
-
gamDiagnostics – GAM Diagnostics table(AIC, RMSE, and Adj. R-squared)
-
perChange – Estimates of Change table
-
porDiff.regular – Estimate of change in list format
-
porDiff.adjusted – Estimate of change adjusted for interventions in list format
-
predictions – Data frame of gam predictions (all information used to create baytrends graphics excluding measurements)
Examples
## Not run:
# Specify parameter and station to analyze
dep <- 'do'
stat <- 'CB5.4'
layer <- 'B'
# Prepare data and set up specifications for analysis
dfr <- analysisOrganizeData (dataCensored)
df <- dfr[[1]]
analySpec <- dfr[[2]]
# Apply gamTest
gamResult <- gamTest(df, dep, stat, layer, analySpec=analySpec)
gamPlotDisp(gamResult = gamResult, analySpec = analySpec,
fullModel = 2, seasAvgModel = 2, seasonalModel = 2,
diffType = "regular", obserPlot = TRUE, interventionPlot = TRUE,
seasAvgPlot = TRUE, seasAvgConfIntPlot = FALSE,
seasAvgSigPlot = FALSE, fullModelPlot = TRUE, seasModelPlot = TRUE,
BaseCurrentMeanPlot = FALSE, adjustedPlot = FALSE)
# Apply gamTestSeason
gamResult2 <- gamTestSeason(df, dep, stat, layer, analySpec=analySpec,
gamSeasonPlot = c("7/15-8/15", "purple", "range"))
gamPlotDispSeason(gamResult = gamResult2, analySpec = analySpec,
fullModel = 2, seasAvgModel = 2, seasonalModel = 2,
diffType = "regular", obserPlot = TRUE, interventionPlot = TRUE,
seasAvgPlot = TRUE, seasAvgConfIntPlot = FALSE,
seasAvgSigPlot = FALSE, fullModelPlot = FALSE, seasModelPlot = FALSE,
BaseCurrentMeanPlot = TRUE, adjustedPlot = FALSE, gamSeasonFocus = TRUE)
## End(Not run)
Perform GAM analysis for Specified Season
Description
Perform GAM analysis for Specified Season. Relies on mgcv::gam to perform general additive model.
Usage
gamTestSeason(
df,
dep,
stat,
layer = NA,
analySpec,
gamTable = TRUE,
gamPlot = 10,
gamDiffModel = c(2),
flow.detrended = NA,
salinity.detrended = NA,
gamSeasonPlot = c("7/1-9/30", "purple", "range")
)
Arguments
df |
data frame |
dep |
dependent variable |
stat |
station |
layer |
layer |
analySpec |
analytical specifications |
gamTable |
gam table setting (set to FALSE to turn off table output) |
gamPlot |
gam plot setting (set to FALSE to turn off plotting) |
gamDiffModel |
GAM model(s) used for computing differences on sub-annual/multi-period basis |
flow.detrended |
data generated by detrended.flow. Default = NA. |
salinity.detrended |
data generated by detrended.flow. Default = NA. |
gamSeasonPlot |
Character vector for evaluating and displaying seasonal model (see details for further information). |
Details
gamSeasonPlot is an additional argument (relative to the arguments used in the gamTest function) that is used to target a specific season for the focus of output and can be a 2- or 3-element vector. While the GAM is fit to all data as in the function gamTest, the output figure will only show the model corresponding to the gamSeasonPlot specifications. The first element of gamSeasonPlot can be a single date (e.g., ‘8/28’) or a date range (e.g., ‘7/1-9/30’). The second element specifies the color of the line to be plotted. If a date range is specified as the first element and an optional third element is provided as ‘range’, then the plot will show the season minimum and maximum as well as the mean; otherwise, only the mean is plotted. If the first element of gamSeasonPlot is a single date then observations within a +/- 15-day window of the date are plotted; otherwise, only observations within the date range are plotted. Estimates of difference are computed with baytrends::gamDiff by setting the argument doy.set to either the single date provided from gamSeasonPlot or the same doys used to compute the season mean. The option to specify seasons that "wrap" around end of year, i.e., 12/1-1/30, has not been implemented.
Value
Returns a list with results
See Also
Examples
## Not run:
# Specify parameter and station to analyze
dep <- 'do'
stat <- 'CB5.4'
layer <- 'B'
# Prepare data and set up specifications for analysis
dfr <- analysisOrganizeData (dataCensored)
df <- dfr[[1]]
analySpec <- dfr[[2]]
# Apply gamTest
gamResult <- gamTest(df, dep, stat, layer, analySpec=analySpec)
gamPlotDisp(gamResult = gamResult, analySpec = analySpec,
fullModel = 2, seasAvgModel = 2, seasonalModel = 2,
diffType = "regular", obserPlot = TRUE, interventionPlot = TRUE,
seasAvgPlot = TRUE, seasAvgConfIntPlot = FALSE,
seasAvgSigPlot = FALSE, fullModelPlot = TRUE, seasModelPlot = TRUE,
BaseCurrentMeanPlot = FALSE, adjustedPlot = FALSE)
# Apply gamTestSeason
gamResult2 <- gamTestSeason(df, dep, stat, layer, analySpec=analySpec,
gamSeasonPlot = c("7/15-8/15", "purple", "range"))
gamPlotDispSeason(gamResult = gamResult2, analySpec = analySpec,
fullModel = 2, seasAvgModel = 2, seasonalModel = 2,
diffType = "regular", obserPlot = TRUE, interventionPlot = TRUE,
seasAvgPlot = TRUE, seasAvgConfIntPlot = FALSE,
seasAvgSigPlot = FALSE, fullModelPlot = FALSE, seasModelPlot = FALSE,
BaseCurrentMeanPlot = TRUE, adjustedPlot = FALSE, gamSeasonFocus = TRUE)
## End(Not run)
Retrieve USGS daily flow data in a wide format
Description
Retrieve USGS daily flow data from NWIS based on list of site IDs (relying on dataRetrieval::readNWISdv and dataRetrieval::renameNWISColumns. The flow and data qualifier code for each site is organized into two columns (e.g., "q10174500", "q101745000cd" for USGS gage 10174500). Flow is stored as cubic meters per second [cms].
Usage
getUSGSflow(
siteNumber,
yearStart,
yearEnd,
fill = TRUE,
span = 10,
max.fill = 10
)
Arguments
siteNumber |
List of site numbers |
yearStart |
Beginning year of data retrieval |
yearEnd |
Ending year of data retrieval |
fill |
TRUE[default]/FALSE field indicating whether to fill in missing values using with USGS' fillMissing function |
span |
the maximum number of observations on each side of each range of missing values to use in constructing the time series model [default=10] |
max.fill |
the maximum gap to fill [default=10] |
Details
This function automatically 'fills in' missing values [unless the user turns this feature off] using code developed by USGS (see smwrBase::fillMissing). The user can also control the maximum gap size to fill in. If a daily flow value is missing, then the corresponding data qualifier fields is set to 'NaN'. The user can use this setting to identify which flows are filled in.
Value
USGS daily flow data for sites in a wide format
Examples
# set retrieval parameters
yearStart <- 2014
yearEnd <- 2014
siteNumber <- c('01578310')
# regular retrieval (default usage)
df <- getUSGSflow(siteNumber, yearStart, yearEnd)
Impute Censored Values
Description
Impute value for multiply censored data.
Usage
impute(x, imputeOption = "mid")
Arguments
x |
vector of type survival::Surv |
imputeOption |
imputation method [default= "mid"], valid impute options are "lower", "upper", "mid", "norm", "lnorm" |
Details
The imputeOption values of "lower"
, "upper"
and "mid"
impute the lower limit, upper limit, and midpoint between the lower and upper
limit. In the context of typical water quality data, these options would be
equivalent to zero, detection limit and 1/2 detection limit substitution.
Options for substituting the normal ["norm"
] or lognormal
["lnorm"
] expectation can also be used.
Value
vector where x is transformed into a simple numeric variable
See Also
makeSurvDF
, unSurvDF
, unSurv
, imputeDF
, imputeDF
,
Examples
## Not run:
x <- dataCensored[1:20,"tdp"]
x.lower <- impute(x,'lower')
x.mid <- impute(x,'mid')
x.upper <- impute(x,'upper')
x.norm <- impute(x,'norm')
x.lnorm <- impute(x,'lnorm')
## End(Not run)
Impute Censored Values in dataframes
Description
Impute value for multiply censored data in data frames
Usage
imputeDF(df, imputeOption = "mid")
Arguments
df |
dataframe |
imputeOption |
imputation method [default= "mid"], valid impute options are "lower", "upper", "mid", "norm", "lnorm" |
Details
The imputeOption values of "lower"
, "upper"
and "mid"
impute the lower limit, upper limit, and midpoint between the lower and upper
limit. In the context of typical water quality data, these options would be
equivalent to zero, detection limit and 1/2 detection limit substitution.
Options for substituting the normal ["norm"
] or lognormal
["lnorm"
] expectation can also be used.
Value
dataframe where fields with censored data (i.e., Surv objects) are transformed into a simple numeric fields
See Also
makeSurvDF
, unSurvDF
, unSurv
, impute
, imputeDF
,
Examples
## Not run:
df <- dataCensored[1:20, ]
df.lower <- imputeDF(df,'lower')
df.mid <- imputeDF(df,'mid')
df.upper <- imputeDF(df,'upper')
df.norm <- imputeDF(df,'norm')
df.lnorm <- imputeDF(df,'lnorm')
## End(Not run)
Aggregate data layers
Description
This function aggregates data layers. Steps: 1) Perform first level error checking to make sure that the data set contains 'layer' and valid aggregation option was selected. 2) Perform second level error checking to make sure the aggregation option selection makes sense (e.g. cannot aggregate "S" and "AP" if no "AP" data are in the data set). 3) Average the data by taking the median or mean based on user input.
Usage
layerAggregation(df, avgTechnique = "mean", layerAggOption = 3)
Arguments
df |
data frame |
avgTechnique |
method for aggregating data ("mean" [default], "median") |
layerAggOption |
(0[default]: no aggregation; 1: combine "S" & "AP" ("SAP"); 2: combine "B" & "BP" ("BBP"); 3: opt 1 & 2 ("SAP", "BBP"); 4: combine all ("ALL")); 5: combine "S" and "B" ("SB") |
Value
data frame with aggregated data
Examples
## Not run:
dfr <- analysisOrganizeData(dataCensored)
# retrieve all corrected chlorophyll-a concentrations for Station CB5.4,
# missing values are removed and transformation applied. Note, a
# warning is displayed indicating that data set has layers but user did
# not specify layer in retrieval. layerAggregation then aggregates per
# specifications
dfr2 <- selectData(dfr[["df"]], 'chla', 'CB5.4', analySpec=dfr[["analySpec"]])
df2 <- dfr2[[1]] # data frame of selected data
iSpec2 <- dfr2[[2]] # meta data about selected data
df2a <- layerAggregation(df2, avgTechnique="mean", layerAggOption=4)
## End(Not run)
Layer List
Description
A lookup table of layer abbreviations and the corresponding names.
Usage
layerLukup
Format
A data frame with 10 rows and 3 variables:
- layers
Layer codes
- order
Analysis order
- name
Layer name
Load/Clean CSV and TXT Data File
Description
Load and clean comma delimited (*.csv) or tab delimited (*.txt) file and perform some rudimentary data cleaning.
Usage
loadData(
file = NA,
folder = ".",
pk = NA,
remDup = TRUE,
remNAcol = TRUE,
remNArow = TRUE,
convDates = TRUE,
tzSel = "America/New_York",
commChar = "#",
naChar = NA
)
Arguments
file |
file (can use wildcards, e.g., "*.csv") |
folder |
folder (i.e., directory to look in, can use relative path ) |
pk |
vector of columns that form the primary key for data set |
remDup |
logical field indicating whether duplicate rows are deleted |
remNAcol |
logical field indicating whether columns with all NA are deleted |
remNArow |
logical field indicating whether rows with all NA are deleted |
convDates |
vector or logical field indicating whether date-like columns should be converted to POSIXct format (see details) |
tzSel |
time zone to use for date conversions (default: "America/New_York") |
commChar |
character for comment line to be skipped |
naChar |
characters to treat as NA |
Details
This function reads in a single comma delimited (*.csv) or tab
delimited (*.txt) file using either utils::read.table
or utils::read.csv
based on the file extension. The user can use the wildcard feature for the
file argument (e.g., file='*.csv') and the function will identify the most
recently modified csv or txt file in the folder for importing.
Some specific features of this function include the following:
1. Leading '0's in character strings that would otherwise be trimmed and treated as numeric variables (e.g., USGS flow gages, state and county FIPS codes) are preserved. To effectively use this functionality, data maintained in a spreadsheet would be enclosed in quotes (e.g., "01578310"). When exported to csv or txt files the field would be in triple quotes (e.g., """01578310"""). Any column read in as integer is converted to numeric.
2. Rows and columns with no data (i.e., all NA) are deleted unless default settings for remNAcol and remNArow are changed to FALSE.
3. Completely duplicate rows are deleted unless default setting for remDup is changed to FALSE.
4. Rows beginning with '#' are skipped unless commChar set to ""
5. If a primary key (either single or multiple columns) is selected, the function enforces the primary key by deleting duplicate entries based on the primary key. Columns corresponding to the primary key (when specified) are moved to the first columns.
6. If convDates is a vector (i.e., c('beginDate', 'endDate')
), then a date
conversion is attempted for the corresponding columns found in the input
file. If TRUE, then a date conversion is attempted for all columns found in
the input file with 'date' in the name, If FALSE, no date conversion is
attempted.
Some other common time zones include the following: America/New_York, America/Chicago, America/Denver, America/Los_Angeles, America/Anchorage, America/Honolulu, America/Jamaica, America/Managua, America/Phoenix, America/Metlakatla
A brief table reporting the results of the import are printed.
Note that columns containing just F, T, FALSE, TRUE are stored as logical fields
Value
Returns data frame
Load/Clean Excel sheet
Description
Load and clean one sheet from an Excel file
Usage
loadExcel(
file = NA,
sheet = 1,
folder = ".",
pk = NA,
remDup = TRUE,
remNAcol = TRUE,
remNArow = TRUE,
convDates = TRUE,
tzSel = "America/New_York"
)
Arguments
file |
file (can use wildcards, e.g., "*.xlsx") |
sheet |
worksheet name to load from Excel file |
folder |
folder (i.e., directory to look in, can use relative path ) |
pk |
vector of columns that form the primary key for data set |
remDup |
logical field indicating whether duplicate rows are deleted |
remNAcol |
logical field indicating whether columns with all NA are deleted |
remNArow |
logical field indicating whether rows with all NA are deleted |
convDates |
vector or logical field indicating whether date-like columns should be converted to POSIXct format (see details) |
tzSel |
time zone to use for date conversions (default: "America/New_York") |
Details
This function reads in a single sheet from an Excel file using readxl::read_excel to load the data
After reading data in with readxl::read_excel, some specific additional steps are implemented:
1. Double quotes are removed from beginning and ending of all fields. The purpose is to maintaion leading zeroes (e.g., USGS flow gages). To effectively use this functionality, data maintained in a spreadsheet would be enclosed in quotes (e.g., "01578310"). If exported to csv or txt files the field would be in triple quotes (e.g., """01578310"""). Any column read in as integer is converted to numeric.
2. Rows and columns with no data (i.e., all NA) are deleted unless default settings for remNAcol and remNArow are changed to FALSE.
3. Completely duplicate rows are deleted unless default setting for remDup is changed to FALSE.
4. If a primary key (either single or multiple columns) is selected, the function enforces the primary key by deleting duplicate entries based on the primary key. Columns corresponding to the primary key (when specified) are moved to the first columns.
5. If convDates is a vector (i.e., c('beginDate', 'endDate')
), then
a date conversion to as.POSIXct
is attempted for the corresponding
columns found in the input file. If TRUE, then a date conversion is
attempted for all columns found in the input file with 'date' in the name,
If FALSE, no date conversion is attempted.
Some other common time zones include the following: America/New_York, America/Chicago, America/Denver, America/Los_Angeles, America/Anchorage, America/Honolulu, America/Jamaica, America/Managua, America/Phoenix, America/Metlakatla
A brief table reporting the results of the import are printed.
Note that columns containing just F, T, FALSE, TRUE are stored as logical fields
Value
Returns data frame
Load Built-in GAM formulas
Description
Returns built-in GAM formulas
Usage
loadModels(gamSelect = "gam4")
Arguments
gamSelect |
character vector of models (Current options include gam0, gam1, gam2, gam3, gam4, gam5) |
Details
By default, the function analysisOrganizeData will store the formulas for gam0-gam4 in the variable analySpec$gamModels as a list. The user can customize this list with the function loadModels (see example).
Value
Returns a list with GAM formulas
Examples
# run analysisOrganizeData function to create the list analySpec
dfr <- analysisOrganizeData (dataCensored, report=NA)
df <- dfr[["df"]]
analySpec <- dfr[["analySpec"]]
# current models in analySpec
analySpec$gamModels
# set models in analySpec to gam0, gam1, and gam2 only
analySpec$gamModels <- loadModels(c('gam0','gam1','gam2'))
Load Built-in GAM formulas for calculating residuals
Description
Load Built-in GAM formulas for calculating residuals
Usage
loadModelsResid(gamSelect = "doy_flw_sal")
Arguments
gamSelect |
character vector of models (Current options include 'doy', 'doy_flw_sal', 'doy_flw_sal_int') |
Value
Returns a list with GAM formulas
Convert dataframe to include survival (Surv) objects
Description
Within a dataframe, paired numeric fields that use a "_lo" and
"_hi" suffix naming convention (e.g., "conc_lo" "conc_hi") are combined
into a single Surv object (e.g., "conc") using the "interval2" option
provided by through the survival::Surv(conc_lo, conc_hi, type =
"interval2")
syntax.
Usage
makeSurvDF(df, suf_lo = "_lo", suf_hi = "_hi")
Arguments
df |
name of data frame |
suf_lo |
Column name suffix for "lo" values. Default = "_lo" |
suf_hi |
Column name suffix for "hi" values. Default = "_hi" |
Details
Converting fields to Surv objects works with field pairs that have a "_lo" and "_hi" suffix naming convention. The numeric value for "_hi" must be greater than or equal to the "_lo" value. Source data that violate this requirement are set to NA with a summary report outputted to the console.
The user can specify their own values for the lo/hi suffixes or use the defaults.
Value
dataframe with Surv fields
See Also
unSurv
, unSurvDF
, impute
, imputeDF
, saveDF
,
Examples
df <- dataCensored[1:20,]
colnames(df)
df1 <- unSurvDF(df)
colnames(df1)
# Default values
df2 <- makeSurvDF(df1)
colnames(df2)
# User values
df3 <- unSurvDF(df, "_LOW", "_HIGH")
colnames(df3)
df4 <- makeSurvDF(df3, "_LOW", "_HIGH")
colnames(df4)
Recode Data
Description
Converts missing values (NA
s) to or from a user specified value.
From smwrBase package.
Usage
na2miss(x, to = -99999)
miss2na(x, from = -99999)
Arguments
x |
a vector. Missing values ( |
to |
the replacement value for |
from |
the target value to match and replace with |
Value
An object like x
with each target value replaced by the
specified value.
Note
The function na2miss
converts missing values (NA
) to the
value to
and is useful to prepare a vector for export and subsequent
use by software external to R that does not handle NAs.
The function
miss2na
converts the value from
to NA
and can be used to
recode data imported from external software that uses a special value to
indicate missing values.
Examples
## Construct simple substitutions
na2miss(c(1, 2, 3, NA, 5, 6))
Compute the Number of Non-Missing Observations
Description
Compute the number of non-missing observations. Provides a 'default' method to handle vectors, and a method for data frames.
Usage
nobs(object, ...)
Arguments
object |
Target Object |
... |
Optional parameters (currently ignored) |
Details
Calculate the number of observations in 'object'.
* For numeric vectors, this is simply the number of non-NA elements, as computed by 'sum(!is.na(object))'.
* For dataframe objects, the result is a vector containing the number of non-NA elementes of each column.
The 'nobs' and 'nobs.lm' functions defined in gtools are simply aliases for the functions in the base R 'stats' package, provided for backwards compatibility.
‘baytrends' borrowed 'gdata::nobs' ’as is' to avoid being archived in 2020. https://github.com/tetratech/baytrends/issues/56
Note
The base R package 'stats' now provides a S3 dispatch function for nobs, and methods for for objects of classes "lm", "glm", "nls" and "logLik", as well as a default method.
Since they provided a subset of the the functionality, the base method dispatch (nobs) function and method for "lm" objects ('nobs.lm') are, as of gdata version 2.10.1, simply aliases for the equivalent functions in the base R 'stats' package.
Since ‘gdata'’s default method ('nobs.default') processes vectors and hands any other data/object types to 'stats:::nobs.default'.
Author(s)
Gregory R. Warnes greg@warnes.net
Examples
x <- c(1,2,3,5,NA,6,7,1,NA )
length(x)
nobs(x)
df <- data.frame(x=rnorm(100), y=rnorm(100))
df[1,1] <- NA
df[1,2] <- NA
df[2,1] <- NA
nobs(df)
fit <- lm(y ~ x, data=df)
nobs(fit)
Parameter List
Description
A lookup table of water quality parameters
Usage
parameterList
Format
A data frame with 82 rows and 13 variables:
- parm
"preferred" parameter ID
- parmSource
parmamter field as may be found in sample data sets
- parmCat
Parameter Group Header
- parmName
Full parameter name
- parmCalc
indicator if parameter is calculated
- parmNamelc
parameter name in lower case
- parmUnits
units
- parmRecensor
numerical value used in log plots if data are reported as <= 0
- parmRO1
Parameter sort order level 1
- parmRO2
Parameter sort order level 2
- parmTrend
TRUE/FALSE for whether parameter should be analyzed for trend
- logTrans
TRUE/FALSE for whether parameter should be analyzed in log transposed
- trendIncrease
Should an increase in conccentration be interpreted as 'improving', 'degrading', 'increasing', or 'decreasing'
Salinity data
Description
Salinity data, 1984 to 2016, for 8 stations.
Usage
sal
Format
A data frame with 51,092 rows and 4 variables:
- station
CBP Station ID
- date
date, YYYY-MM-DD
- layer
sample layer
- salinity
Measured salinity
Save R object to disk
Description
Saves R object to disk using csv and/or Rdata format.
Usage
saveDF(
rObj,
note = NULL,
rData = FALSE,
csv = TRUE,
attr = FALSE,
timeStamp = TRUE,
folder = "_save_df"
)
Arguments
rObj |
Name of R object to save. |
note |
Suffix to include in file name. |
rData |
Logical field to save rObj as an rData file (FALSE [default]). |
csv |
Logical field to save rObj as an a csv file (TRUE [default]). |
attr |
Logical field to save data frame attributes as a text file (FALSE [default]). |
timeStamp |
Logical field to include date/time stamp in file name (TRUE [default]). |
folder |
Subdirectory for saving file ('_save_df is default) |
Details
Output files are saved with an "rObj_note_YYYY_MM_DD_HHMMSS" naming convetion. By default, files are saved as csv files to a '_save_df' subdirectory relative to the working directory and include a time stamp in the file name using utils::write.csv. The default folder can be changed with the folder argument. Inclusion of a time stamp in the file name enables saving the same object at multiple steps through an R script, but can be turned off with the timeStamp argument. To also save object as rData file, set rData=TRUE.
Value
Nothing returned. Saves R object to disk using csv and/or Rdata format.
Examples
## Not run:
df <- data.frame(x=c(1:100))
saveDF(df, 'test_note')
## End(Not run)
Create Daily Seasonally-adjusted Log Flow Residuals
Description
Create Daily Seasonally-adjusted Log Flow Residuals. The procedure to compute daily seasonally-adjusted log flow residuals is the following: 1) Check to make sure that the raw flow data are in the data set and that seasonally adjusted values have not already been computed. If so, no additional computations are performed. If not, then proceed with remaining steps. 2) Add date features if not already in the data set. 3) Compute and store Log (ln) flow as 'LogQ...' 4) Compute GAM model and store seasonally adjusted LogQ flow residuals as 'sa0LogQ...' 5) Smooth seasonally adjusted Log flow residuals by an averaged values based on dvAvgWin, dvAvgWgt, and dvAvgSides and store as "saxLogQ..."
Usage
seasAdjflow(
dvFlow = dvFlow,
siteNumber = NULL,
dvAvgWin = c(7, 31),
dvAvgWgt = "weighted",
dvAvgSides = 1,
plotResid = c(1)
)
Arguments
dvFlow |
data frame with daily flow data. The flow and data qualifier code for each site is organized into two columns (e.g., "q01594440", "q01594440cd" for USGS gage 01594440) |
siteNumber |
a single USGS gage ID |
dvAvgWin |
Averaging window (days) for smoothing the residuals of the seasonally adjusted daily flow values. |
dvAvgWgt |
Averaging method ("uniform", "weighted" [default], or "centered") for creating weights. If using "weighted" then use dvAvgSides=1. If using "centered" then use dvAvgSides=2. |
dvAvgSides |
If dvAvgSides=1 only past values are used, if dvAvgSides=2 then values are centered around lag 0. |
plotResid |
plot residuals for selected averaging windows. |
Value
return data frame of flow data with additional seasonally adjusted values
Examples
#Set Retrieval Parameters
yearStart <- 1983
yearEnd <- 2015
siteNumbers <- c("01578310")
# Regular Retrieval (default usage)
df <- getUSGSflow(siteNumbers, yearStart, yearEnd, fill=TRUE)
# Apply default smoothing
df <- seasAdjflow(df,"01578310")
Select data for analysis from a larger data frame
Description
Select data for analysis from a larger data frame based on dependent variable, station, and layer. Removing records with missing values, performing log-transformations, and adding a centering date are performed based on settings.
Usage
selectData(
df,
dep,
stat,
layer = NA,
transform = TRUE,
remMiss = TRUE,
analySpec
)
Arguments
df |
data frame |
dep |
dependent variable |
stat |
station |
layer |
layer (optional) |
transform |
logical field to return log-transformed value (TRUE [default]) |
remMiss |
logical field to remove records where dependent variable, dep, is a missing value (TRUE [default]) |
analySpec |
analytical specifications |
Details
The returned data frame will include dyear and cyear. dyear is the decimal year computed using smwrBase::baseDay2decimal and smwrBase::baseDay. From this, the minimum and maximum 'dyear' are averaged. This averaged value, centerYear, is used to compute the centering date, cyear, using cyear = dyear - centerYear.
The variable identified by dep is copied to the variable name dep+".orig" (e.g., chla.orig) allowing the user to track the original concentrations. A new column, recensor, is added. The value of recensor is FALSE unless the value of dep.orig was <=0. In the cases where dep.orig is <= 0, recensor is set to TRUE and the value of dep is set to "less-than" a small positive value which is stored as iSpec$recensor. If transform=TRUE, the returned data frame will also include a variable "ln"+dep (i.e., "lnchla" for log transformed chla).
The data frame will include a column, intervention, which is a factor identifying different periods of record such as when different laboratory methods were used and is based on the data frame methodsList that is loaded into the global environment. This column is set to "A" with only 1 level if the data frame methodsList has not been loaded into the global environment.
The data frame will include a column, lowCensor, to indicate whether the data record occurs in a year with a low level of censoring over that particular year. The function gamTest uses this column to identify years of record (i.e., when lowCensor==FALSE) that should not be used in analyses.
If remMiss=TRUE, then the returned data frame will be down selected by removing records where the variable identified in 'dep' is missing; otherwise, no down selection is performed.
iSpec contains a large list of information
dep - name of column where dependent variable is stored, could be "ln"+dep for variables that will be analyzed after natural log transformation
depOrig - name of original dependent variable, could be same as dep if no transformation is used
stat - name of station
stationMethodGroup - name of station group that the station belongs to, derived from station list (stationMasterList) and used to identify interventions specified in methodsList table
intervenNum - number of interventions found for this station and dependent variable as derived from methodsList table, a value of 1 is assigned if no methodsList entry is found
intervenList - data frame of interventions identified by beginning and ending date and labeled consecutively starting with "A"
layer - layer
layerName - layer name derived from layerLukup
transform - TRUE/FALSE indicating whether log transformations were taken
trendIncrease - an indicator for interpretation of an increasing concentration
logConst - not currently used
recensor - small value that observations <=0 are recensored to as "less than" the small value
censorFrac - data frame indicating the yearly number of observations and fraction of observations reported as less than, uncensored, interval censored, less than zero, and recensored; also includes a 'lowCensor' field indicating which years will be dropped by gamTest due to high yearly censoring
yearRangeDropped - year range of data that will be dropped due to censoring
censorFracSum - censoring overall summary
centerYear - centering year
parmName - parameter name
parmNamelc - parameter name in lower case
parmUnits - parameter units
statLayer - station/layer label, e.g., "LE3.1 (S)"
usgsGageID - USGS gage used for flow adjustments
usgsGageName - USGS gage used for flow adjustments
numObservations - number of observations
dyearBegin - begin date in decimal form
dyearEnd - end date in decimal form
dyearLength - period of record length
yearBegin - period of record begin year
yearend - period of record end year
dateBegin - begin date
dateEnd - end date
The baseDay and baseDay2decimal functions have been added to this package from the smwrBase package.
Value
A nest list is returned. The first element of the nest list is the down-selected data frame. The second element is the list, iSpec, contains specifications for data extraction. See examples for usage and details for further discussion of the data processing and components of each element.
Examples
## Not run:
dfr <- analysisOrganizeData(dataCensored)
# retrieve Secchi depth for Station CB5.4, no transformations are applied
dfr1 <- selectData(dfr[["df"]], 'secchi', 'CB5.4', 'S', transform=FALSE,
remMiss=FALSE, analySpec=dfr[["analySpec"]])
df1 <- dfr1[[1]] # data frame of selected data
iSpec1 <- dfr1[[2]] # meta data about selected data
# retrieve surface corrected chlorophyll-a concentrations for Station CB5.4,
# missing values are removed and transformation applied
dfr2 <- selectData(dfr[["df"]], 'chla', 'CB5.4', 'S', analySpec=dfr[["analySpec"]])
df2 <- dfr2[[1]] # data frame of selected data
iSpec2 <- dfr2[[2]] # meta data about selected data
## End(Not run)
Chesapeake Bay Program long-term tidal monitoring stations
Description
Chesapeake Bay Program long-term tidal monitoring stations
Usage
stationMasterList
Format
A data frame with 145 rows and 19 variables:
- station
Water quality station code
- state
State location
- locationType
As identified in the state trend reports, whether this station is in the mainstem or tributary monitoring
- waterbody
Location as identified in the CBP CIMS database
- latitude
From the CBP CIMS database, and verified in a GIS map
- longitude
From the CBP CIMS database, and verified in a GIS map
- cbSeg92
Match to CB 92 Segmentation scheme (for 303d list)
- usgsGageName
Manual match to a USGS fall-line monitoring station (See usgsGages$siteName)
- usgsGageID
USGS station code (See usgsGages$siteNumber)
- usgsGageMatch
Identifies how the USGS-to-TribStation match was made. Direct: If the station falls within a tributary with a USGS station, or in the mainstem. Indirect: If the station is in a small sub-tributary of one of the major tributaries. The USGS station may not be that representative, but it is better than matching to the Susquehanna. SusDefault: If there was no clear match, the tidal station was matched to the Susquehanna River.
- stationRO1
Station sort order level 1
- stationRO2
Station sort order level 2
- stationGrpName
Station group header
- stationMethodGroup
Foreign key to methodsList table
- hydroTerm
"flow" or "salinity" to indicate whether to model flow or salinity effects (if missing, "flow" assumed)
- flwAvgWin
list of averaging windows if flow is selected in hydroTerm (options are: 1, 5, 10, 15, 20, 30, 40, 50, 60, 90, 120, 150, 180, 210)
- flwParms
list of parameters to model with flow (regardless of hydroTerm value), each parameter separated by a space
- salParms
list of parameters to model with salinity (regardless of hydroTerm value), each parameter separated by a space
- notes
Optional note to track updates
Converts Surv object into a 3-column matrix
Description
Converts Surv object into a 3-column matrix
Usage
unSurv(x, col_lo = "lo", col_hi = "hi")
Arguments
x |
vector (Surv object) |
col_lo |
Output column name for "lo" values. Default = "lo" |
col_hi |
Output column name for "hi" values. Default = "hi" |
Details
The third column of the returned matrix (type) has the following meanings:
1 – no censoring
2 – left censored ("less than in a survival sense" , e.g., [-Inf to 10], <10)
3 – interval censored ("less than in a water quality sense", e.g., "0 - <3", "1 - 3")
NA – missing value
The user can specify the names of the low and high columns in the output. Defaults are "lo" and "hi".
Value
Returns a 3-column matrix: lo, hi, type
See Also
makeSurvDF
, unSurvDF
, impute
, imputeDF
, saveDF
,
Examples
df1 <- dataCensored[dataCensored$station=="CB3.3C"
& dataCensored$date < as.POSIXct("1985-08-01")
, c("station","date","chla")]
colnames(df1)
# Default values
chla_1 <- unSurv(df1$chla)
colnames(chla_1)
# User values
chla_2 <- unSurv(df1$chla, "LOW", "HIGH")
colnames(chla_2)
Converts Surv objects in a dataframe to "lo" and "hi" values
Description
Converts Surv objects in a dataframe to "lo" (i.e., lower) and "hi" (i.e., upper) values. The user can specify their own values or use the defaults.
Usage
unSurvDF(df, suf_lo = "_lo", suf_hi = "_hi")
Arguments
df |
dataframe with Surv objects |
suf_lo |
Column name suffix for "lo" values. Default = "_lo" |
suf_hi |
Column name suffix for "hi" values. Default = "_hi" |
Value
Returns dataframe with censored data converted to lo/hi format
See Also
makeSurvDF
, unSurv
, impute
, imputeDF
, saveDF
,
Examples
df <- dataCensored[dataCensored$station=="CB3.3C", ][1:20,]
colnames(df)
# Default values
df2 <- unSurvDF(df)
colnames(df2)
# User values
df3 <- unSurvDF(df, "_LOW", "_HIGH")
colnames(df3)
USGS Gages
Description
List of core USGS gages for CBP trend analyses
Usage
usgsGages
Format
A data frame with 9 rows and 2 variables:
- usgsGageID
USGS Gage ID
- siteName
USGS Site Name