Dates, times and timezones can be frustrating, especially when working with environmental time series such as those collected by air and water quality sensors.
Environmental time series data often have a strong diurnal signal and are typically plotted with a time axis displaying local time. However, when data are aggregated into larger collections, it is typical to store data with a universal time axis – UTC.
Problems can arise when parsing and formatting dates and times because R defaults to the system timezone available with Sys.timezone(). Imagine an agency scientist based in Washington, DC, using their laptop to display recent airquality data from Los Angeles while at a conference in Tasmania. The data center processing the data might be in Boudler but the data processing machine might be set to use UTC. Potential timezones (availabe with OlsonNames()) include:
America/New_YorkAmerica/Los_AngelesAustralia/TasmaniaAmerica/DenverUTCWhich timezone shold be used to convert a request for data from “2019-08-08”" to “2018-08-15”" into POSIXct datetimes?
To enforce specification of timezones and to help with the common user interface need to specify a range of dates or times, the MazamaCoreUtils package provides the following functions:
dateRange() – parses and returns POSIXct start and end dates representing full days in the specified timezonetimeRange() – parses and returns POSixct start and end times in the specified timezoneparseDatetime() – parses and returns a vector of POSIXct values in the specified timezoneThe parseDatetime() function is intended as a timezone-requiring replacement for lubridate::parse_date_time().
Enforcing the specification of timezones throughout a body of code is the most robust way to remove timezone-related errors from your code. To help with this type of code review, the package also includes functions for testing whether specific named arguments are used with certain function calls:
lintFunctionArgs_file() – check a single filelintFunctionArgs_dir() – check an entire directoryTo use these functions you must define a set of function:argument rules to be applied such as:
timezoneLintRules <- list(
"parse_date_time" = "tz",
"with_tz" = "tzone",
"now" = "tzone",
"strftime" = "tz"
)
This is interpreted as:
parse_date_time() function must use the tz argument explicitly.with_tz() function must use the tzone argument explicitlyWhile these functions could be used to test for explicit use in any function:argument pair, our concern here is primarily with specification of timezones. As an example, here is the result of linting the dateRange.R function in this package:
> lintFunctionArgs_file("R/dateRange.R", timezoneLintRules)
# A tibble: 7 x 6
file line_number column_number function_name named_args includes_required
<chr> <int> <int> <chr> <list> <lgl>
1 dateRange.R 125 29 with_tz <chr [1]> TRUE
2 dateRange.R 128 27 with_tz <chr [1]> TRUE
3 dateRange.R 141 18 parse_date_time <chr [2]> TRUE
4 dateRange.R 142 18 parse_date_time <chr [2]> TRUE
5 dateRange.R 159 18 parse_date_time <chr [2]> TRUE
6 dateRange.R 176 18 parse_date_time <chr [2]> TRUE
7 dateRange.R 188 18 now <chr [1]> TRUE
The result shows that the dateRange.R source code is consistent in always explicitly specifying a timezone.
Hopefully, this attention to timezones will help our code avoid misunderstandings when it comes to date and time requests.