| Type: | Package |
| Title: | A 'dplyr' Back End for Databases |
| Version: | 2.6.0 |
| Description: | A 'dplyr' back end for databases that allows you to work with remote database tables as if they are in-memory data frames. Basic features work with any database that has a 'DBI' back end; more advanced features require 'SQL' translation to be provided by the package author. |
| License: | MIT + file LICENSE |
| URL: | https://dbplyr.tidyverse.org/, https://github.com/tidyverse/dbplyr |
| BugReports: | https://github.com/tidyverse/dbplyr/issues |
| Depends: | R (≥ 4.1) |
| Imports: | blob (≥ 1.2.0), cli (≥ 3.6.1), DBI (≥ 1.1.3), dplyr (≥ 1.1.2), glue (≥ 1.6.2), lifecycle (≥ 1.0.3), magrittr, methods, pillar (≥ 1.9.0), purrr (≥ 1.0.1), R6 (≥ 2.2.2), rlang (≥ 1.1.1), tibble (≥ 3.2.1), tidyr (≥ 1.3.0), tidyselect (≥ 1.2.1), utils, vctrs (≥ 0.6.3), withr (≥ 2.5.0) |
| Suggests: | adbcdrivermanager, adbcsqlite, adbi, bit64, covr, knitr, Lahman, nycflights13, odbc (≥ 1.4.2), RJDBC, RMariaDB (≥ 1.2.2), rmarkdown, RPostgres (≥ 1.4.5), RPostgreSQL, RSQLite (≥ 2.3.8), testthat (≥ 3.1.10) |
| VignetteBuilder: | knitr |
| Config/Needs/website: | tidyverse/tidytemplate |
| Config/testthat/edition: | 3 |
| Config/testthat/parallel: | TRUE |
| Encoding: | UTF-8 |
| Language: | en-gb |
| Collate: | 'verb-copy-inline.R' 'verb-copy-to.R' 'db-sql.R' 'db.R' 'utils-check.R' 'import-standalone-types-check.R' 'import-standalone-obj-type.R' 'utils.R' 'sql.R' 'escape.R' 'translate-sql-cut.R' 'translate-sql-string.R' 'translate-sql-aggregate.R' 'translate-sql-scalar.R' 'translate-sql-helpers.R' 'translate-sql-window.R' 'translate-sql-conditional.R' 'backend-.R' 'backend-access.R' 'backend-adbc.R' 'backend-db2.R' 'backend-hana.R' 'backend-hive.R' 'backend-impala.R' 'backend-jdbc.R' 'backend-mssql.R' 'backend-mysql.R' 'backend-odbc.R' 'backend-oracle.R' 'backend-postgres.R' 'backend-postgres-old.R' 'backend-redshift.R' 'backend-snowflake.R' 'backend-spark-sql.R' 'backend-sqlite.R' 'backend-teradata.R' 'backward-compatibility.R' 'bind-queries.R' 'data-cache.R' 'data-lahman.R' 'data-nycflights13.R' 'db-io.R' 'dbplyr.R' 'ident.R' 'import-standalone-s3-register.R' 'join-by-compat.R' 'join-cols-compat.R' 'lazy-ops.R' 'memdb.R' 'optimise-utils.R' 'progress.R' 'query-base.R' 'query-join.R' 'query-rf-join.R' 'query-select.R' 'query-semi-join.R' 'query-set-op.R' 'query-union.R' 'query.R' 'remote.R' 'rows.R' 'schema.R' 'sql-build.R' 'sql-clause.R' 'sql-dialect.R' 'sql-glue.R' 'sql-quote.R' 'sql-superseded.R' 'src-sql.R' 'src_dbi.R' 'table-name.R' 'tbl-lazy.R' 'tbl-sql.R' 'tidyeval-across.R' 'tidyeval.R' 'translate-sql.R' 'utils-format.R' 'verb-arrange.R' 'verb-collapse.R' 'verb-collect.R' 'verb-compute.R' 'verb-count.R' 'verb-distinct.R' 'verb-do-query.R' 'verb-do.R' 'verb-expand.R' 'verb-explain.R' 'verb-fill.R' 'verb-filter.R' 'verb-group_by.R' 'verb-head.R' 'verb-joins.R' 'verb-mutate.R' 'verb-pivot-longer.R' 'verb-pivot-wider.R' 'verb-pull.R' 'verb-select.R' 'verb-set-ops.R' 'verb-slice.R' 'verb-summarise.R' 'verb-uncount.R' 'verb-window.R' 'with-dialect.R' 'zzz.R' |
| Config/roxygen2/version: | 8.0.0 |
| NeedsCompilation: | no |
| Packaged: | 2026-06-17 12:33:03 UTC; hadleywickham |
| Author: | Hadley Wickham [aut, cre], Maximilian Girlich [aut], Edgar Ruiz [aut], Posit Software, PBC [cph, fnd] |
| Maintainer: | Hadley Wickham <hadley@posit.co> |
| Repository: | CRAN |
| Date/Publication: | 2026-06-17 21:50:07 UTC |
dbplyr: A 'dplyr' Back End for Databases
Description
A 'dplyr' back end for databases that allows you to work with remote database tables as if they are in-memory data frames. Basic features work with any database that has a 'DBI' back end; more advanced features require 'SQL' translation to be provided by the package author.
Author(s)
Maintainer: Hadley Wickham hadley@posit.co
Authors:
Hadley Wickham hadley@posit.co
Maximilian Girlich
Edgar Ruiz
Other contributors:
Posit Software, PBC [copyright holder, funder]
See Also
Useful links:
Report bugs at https://github.com/tidyverse/dbplyr/issues
Flag SQL function usage
Description
Use .sql$foo(x, y) to make it clear that you're calling the SQL
foo() function, not the R foo() function. This also makes it easier to
reduce R CMD check notes in packages; just import .sql from dbplyr with
e.g. @importFrom dbplyr .sql.
Note that .sql itself does nothing and is just NULL; it is automatically
removed when dbplyr translates your R code to SQL.
Usage
.sql
Examples
library(dplyr, warn.conflicts = FALSE)
db <- lazy_frame(x = 1, y = 2)
db |> mutate(z = .sql$CUMULATIVE_SUM(x, 1))
Arrange rows by column values
Description
This is an method for the dplyr arrange() generic. It generates
the ORDER BY clause of the SQL query. It also affects the
window_order() of windowed expressions in mutate.tbl_lazy().
Note that ORDER BY clauses can not generally appear in subqueries, which
means that you should arrange() as late as possible in your pipelines.
Usage
## S3 method for class 'tbl_lazy'
arrange(.data, ..., .by_group = FALSE)
Arguments
.data |
A lazy data frame backed by a database query. |
... |
< |
.by_group |
If |
Value
Another tbl_lazy. Use show_query() to see the generated
query, and use collect() to execute the query
and return data to R.
Missing values
Unlike R, most databases sorts NA (NULLs) at the front. You can
can override this behaviour by explicitly sorting on is.na(x).
Examples
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(a = c(3, 4, 1, 2), b = c(5, 1, 2, NA))
db |> arrange(a) |> show_query()
# Note that NAs are sorted first
db |> arrange(b)
# override by sorting on is.na() first
db |> arrange(is.na(b), b)
Convert to sql (deprecated)
Description
Usage
as.sql(x, con)
Arguments
x |
Object to coerce |
con |
Needed when |
MS Access backend
Description
This backend supports Microsoft Access databases, typically accessed via
odbc. Use dialect_access() with lazy_frame() to see simulated SQL without
connecting to a live database.
Key differences for this backend are:
-
SELECTusesTOP, notLIMIT Non-standard types and mathematical functions
String concatenation uses
&No
ANALYZEequivalent-
TRUEandFALSEconverted to 1 and 0
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
Usage
dialect_access()
simulate_access()
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(x = 1, y = 2, z = "a", con = dialect_access())
lf |> head()
lf |> mutate(y = as.numeric(y), z = sqrt(x^2 + 10))
lf |> mutate(a = paste0(z, " times"))
ADBC backend
Description
This backend supports databases accessed via AdbiConnection created by
adbi::adbi() and DBI::dbConnect(). dbplyr automatically detects the
underlying database type by querying the ADBC driver's vendor name and
uses the appropriate SQL dialect.
The following vendors are recognized:
PostgreSQL and PostgreSQL-compatible systems (CrateDB, CockroachDB, Citus, Neon, ParadeDB, TimescaleDB, Yellowbrick, YugabyteDB, CedarDB):
dialect_postgres()MySQL and MySQL-compatible systems (MariaDB, TiDB, Vitess): the appropriate dialect (
dialect_mariadb()for MariaDB,dialect_mysql()otherwise)SQLite:
dialect_sqlite()Microsoft SQL Server:
dialect_mssql()Snowflake:
dialect_snowflake()Amazon Redshift:
dialect_redshift()
If your database is not recognized, dbplyr will fall back to a generic ODBC
dialect. In this case, or if dbplyr guesses wrong, you can use
with_dialect() to choose a specific dialect.
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
Examples
# ADBC connections require the adbi package and an ADBC driver.
# Once connected, dbplyr automatically detects the database type:
#
# library(adbi)
# library(dplyr, warn.conflicts = FALSE)
#
# con <- DBI::dbConnect(adbi::adbi("adbcsqlite"), uri = ":memory:")
# tbl(con, "my_table") |> filter(x > 1)
ANSI SQL backend
Description
This is the base dialect for ANSI compliant SQL, forming the foundation
of all other dialects. Use dialect_ansi() with lazy_frame() to see
simulated SQL without connecting to a live database.
See vignette("translation-function") for a list of functions that are
translated.
Usage
dialect_ansi()
simulate_dbi(class = character(), ...)
Arguments
class, ... |
No longer used. |
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_ansi())
lf |> transmute(x = mean(b, na.rm = TRUE))
lf |> transmute(x = log(b), y = log(b, base = 2))
DB2 backend
Description
This backend supports IBM DB2 databases, typically accessed via ODBC.
Use dialect_db2() with lazy_frame() to see simulated SQL without
connecting to a live database.
Key differences for this backend are:
Uses
FETCH FIRST n ROWS ONLYinstead ofLIMIT nUses double quotes for identifier quoting
-
paste()uses|| DB2-specific data type names for casts (e.g.
VARCHAR(255),DOUBLE)Date component extraction via
YEAR(),MONTH(), ...,DAYOFYEAR(),DAYOFWEEK(),QUARTER(),WEEK()-
str_flatten()usesLISTAGG Statistical summaries
sd(),var(),cor(),cov()-
runif()translates toRAND() Regular expression functions (DB2 11.1+)
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
Usage
dialect_db2()
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_db2())
lf |> head()
lf |> transmute(x = paste0(d, " times"))
lf |> summarise(x = sd(b, na.rm = TRUE))
SAP HANA backend
Description
This backend supports SAP HANA databases, typically accessed via
HDBConnection created by DBI::dbConnect(). Use dialect_hana() with
lazy_frame() to see simulated SQL without connecting to a live database.
Key differences for this backend are:
Temporary tables get
#prefix and useLOCAL TEMPORARY COLUMN.No table analysis performed in
copy_to().-
paste()uses|| Note that you can't create new boolean columns from logical expressions; you need to wrap with explicit
ifelse:ifelse(x > y, TRUE, FALSE).
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
Usage
dialect_hana()
simulate_hana()
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_hana())
lf |> transmute(x = paste0(d, " times"))
Hive backend
Description
This backend supports Apache Hive, typically accessed via odbc. Use
dialect_hive() with lazy_frame() to see simulated SQL without connecting
to a live database.
Key differences for this backend are a scattering of custom translations provided by users.
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
Usage
dialect_hive()
simulate_hive()
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, d = 2, c = "z", con = dialect_hive())
lf |> transmute(x = cot(b))
lf |> transmute(x = bitwShiftL(c, 1L))
lf |> transmute(x = str_replace_all(c, "a", "b"))
lf |> summarise(x = median(d, na.rm = TRUE))
lf |> summarise(x = var(c, na.rm = TRUE))
Impala backend
Description
This backend supports Apache Impala, typically accessed via odbc. Use
dialect_impala() with lazy_frame() to see simulated SQL without
connecting to a live database.
Key differences for this backend are a scattering of custom translations provided by users, mostly focussed on bitwise operations.
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
Usage
dialect_impala()
simulate_impala()
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_impala())
lf |> transmute(X = bitwNot(bitwOr(b, c)))
JDBC backend
Description
This backend supports databases accessed via JDBCConnection created by
RJDBC::JDBC() and DBI::dbConnect(). dbplyr automatically detects the
underlying database type from the JDBC connection class and uses the
appropriate SQL dialect.
The following databases are recognized via their JDBC connection class:
PostgreSQL (
org.postgresql.*):dialect_postgres()MySQL (
com.mysql.*):dialect_mysql()MariaDB (
org.mariadb.*):dialect_mariadb()SQLite (
org.sqlite.*):dialect_sqlite()Oracle (
oracle.*):dialect_oracle()SQL Server (
com.microsoft.sqlserver.*):dialect_mssql()IBM DB2 (
com.ibm.db2.*):dialect_db2()SAP HANA (
com.sap.db.*):dialect_hana()Teradata (
com.teradata.*):dialect_teradata()Apache Hive (
org.apache.hive.*):dialect_hive()Apache Spark (
org.apache.spark.*orcom.simba.spark.*):dialect_spark_sql()Snowflake (
net.snowflake.*):dialect_snowflake()Impala (
com.cloudera.impala.*):dialect_impala()Amazon Redshift (
com.amazon.redshift.*):dialect_redshift()
If your database is not recognized, dbplyr will fall back to a generic ODBC dialect. Please file an issue if you'd like support for additional databases.
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
See Also
with_dialect() to use a different dialect if dbplyr guesses
incorrectly, or a more specific translation is available.
Examples
# JDBC connections require the RJDBC package and a JDBC driver JAR file.
# Once connected, dbplyr automatically detects the database type:
#
# library(RJDBC)
# library(dplyr, warn.conflicts = FALSE)
#
# drv <- JDBC("org.postgresql.Driver", "postgresql.jar")
# con <- dbConnect(drv, "jdbc:postgresql://localhost/mydb", "user", "password")
# tbl(con, "my_table") |> filter(x > 1)
SQL Server backend
Description
This backend supports Microsoft SQL Server, typically accessed via odbc. Use
dialect_mssql() with lazy_frame() to see simulated SQL without connecting
to a live database.
Key differences for this backend are:
-
SELECTusesTOPnotLIMIT Automatically prefixes
#to create temporary tables. Add the prefix yourself to avoid the message.String basics:
paste(),substr(),nchar()Custom types for
as.*functionsLubridate extraction functions,
year(),month(),day()etcSemi-automated bit <-> boolean translation (see below)
stringr functions
str_detect(),str_starts(),str_ends()withfixed()patterns work on all versions; regular expression patterns require SQL Server 2025+ (version 17.0)stringr functions
str_replace(),str_replace_all(),str_remove(),str_remove_all(),str_extract(), andstr_count()require SQL Server 2025+ (version 17.0)
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
Usage
dialect_mssql(version = "15.0")
simulate_mssql(version = "15.0")
Arguments
version |
Version of MS SQL to simulate. Currently, 11.0 and above
will use |
Bit vs boolean
SQL server uses two incompatible types to represent TRUE and FALSE
values:
The
BOOLEANtype is the result of logical comparisons (e.g.x > y) and can be usedWHEREbut not to create new columns inSELECT. https://learn.microsoft.com/en-us/sql/t-sql/language-elements/comparison-operators-transact-sqlThe
BITtype is a special type of numeric column used to storeTRUEandFALSEvalues, but can't be used inWHEREclauses. https://learn.microsoft.com/en-us/sql/t-sql/data-types/bit-transact-sql?view=sql-server-ver15
dbplyr does its best to automatically create the correct type when needed, but can't do it 100% correctly because it does not have a full type inference system. This means that you many need to manually do conversions from time to time.
To convert from bit to boolean use
x == 1To convert from boolean to bit use
as.logical(if(x, 0, 1))
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_mssql())
lf |> head()
lf |> transmute(x = paste(b, c, d))
# Can use boolean as is:
lf |> filter(c > d)
# Need to convert from boolean to bit:
lf |> transmute(x = c > d)
# Can use boolean as is:
lf |> transmute(x = ifelse(c > d, "c", "d"))
MySQL/MariaDB backend
Description
This backend supports MySQL and MariaDB databases, typically accessed via
MySQLConnection or MariaDBConnection created by DBI::dbConnect(). Use
dialect_mysql() with lazy_frame() to see simulated SQL without connecting
to a live database.
Key differences for this backend are:
-
paste()usesCONCAT_WS() String translations for
str_detect(),str_locate(), andstr_replace_all()Clear error message for unsupported full joins
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
Usage
dialect_mariadb()
dialect_mysql()
simulate_mysql()
simulate_mariadb()
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_mysql())
lf |> transmute(x = paste0(d, " times"))
ODBC backend
Description
This backend supports databases accessed via OdbcConnection created by
DBI::dbConnect(). Use dialect_odbc() with lazy_frame() to see simulated
SQL without connecting to a live database.
Key differences for this backend are minor translations for common data types.
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
Usage
dialect_odbc()
simulate_odbc()
See Also
with_dialect() to use a different dialect if dbplyr guesses
incorrectly, or a more specific translation is available.
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, d = 2, c = "z", con = dialect_odbc())
lf |> transmute(x = as.numeric(b))
lf |> transmute(x = as.integer(b))
lf |> transmute(x = as.character(b))
Oracle backend
Description
This backend supports Oracle databases, typically accessed via
OraConnection created by DBI::dbConnect(). Use dialect_oracle() with
lazy_frame() to see simulated SQL without connecting to a live database.
Key differences for this backend are:
Use
FETCH FIRSTinstead ofLIMITCustom types
-
paste()uses|| Custom subquery generation (no
AS)-
setdiff()usesMINUSinstead ofEXCEPT
Note that versions of Oracle prior to 23c have limited supported for
TRUE and FALSE and you may need to use 1 and 0 instead.
See https://oracle-base.com/articles/23/boolean-data-type-23 for
more details.
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
Usage
dialect_oracle()
simulate_oracle()
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_oracle())
lf |> transmute(x = paste0(c, " times"))
lf |> setdiff(lf)
PostgreSQL backend
Description
This backend supports PostgreSQL databases, typically accessed via a
PqConnection created by DBI::dbConnect(). Use dialect_postgres() with
lazy_frame() to see simulated SQL without connecting to a live database.
Key differences for this backend are:
Many stringr functions
lubridate date-time extraction functions
More standard statistical summaries
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
Usage
dialect_postgres()
simulate_postgres()
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_postgres())
lf |> summarise(x = sd(b, na.rm = TRUE))
lf |> summarise(y = cor(b, c), z = cov(b, c))
Redshift backend
Description
This backend supports Amazon Redshift databases, typically accessed via
a RedshiftConnection created by DBI::dbConnect(). Use dialect_redshift()
with lazy_frame() to see simulated SQL without connecting to a live
database.
Base translations come from PostgreSQL backend. There are generally few differences, apart from string manipulation.
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
Usage
dialect_redshift()
simulate_redshift()
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_redshift())
lf |> transmute(x = paste(c, " times"))
lf |> transmute(x = substr(c, 2, 3))
lf |> transmute(x = str_replace_all(c, "a", "z"))
Snowflake backend
Description
This backend supports Snowflake databases, typically accessed via odbc. Use
dialect_snowflake() with lazy_frame() to see simulated SQL without
connecting to a live database.
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
Usage
dialect_snowflake()
simulate_snowflake()
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_snowflake())
lf |> transmute(x = paste0(d, " times"))
Databricks Spark SQL backend
Description
This backend supports Databricks Spark SQL, typically accessed via the
Databricks ODBC or JDBC connector. Use dialect_spark_sql() with
lazy_frame() to see simulated SQL without connecting to a live database.
Key differences for this backend are better translation of statistical
aggregate functions (e.g. var(), median()) and use of temporary views
instead of temporary tables when copying data.
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
Usage
dialect_spark_sql()
simulate_spark_sql()
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, d = 2, c = "z", con = dialect_spark_sql())
lf |> summarise(x = median(d, na.rm = TRUE))
lf |> summarise(x = var(c, na.rm = TRUE), .by = d)
lf |> mutate(x = first(c))
lf |> mutate(x = first(c), .by = d)
SQLite backend
Description
This backend supports SQLite databases, typically accessed via
a SQLiteConnection created by DBI::dbConnect(). Use dialect_sqlite()
with lazy_frame() to see simulated SQL without connecting to a live
database.
Key differences for this backend are:
Uses non-standard
LOG()functionDate-time extraction functions from lubridate
Custom median translation
Right and full joins are simulated using left joins
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
Usage
dialect_sqlite()
simulate_sqlite()
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_sqlite())
lf |> transmute(x = paste(c, " times"))
lf |> transmute(x = log(b), y = log(b, base = 2))
Teradata backend
Description
This backend supports Teradata databases, typically accessed via odbc. Use
dialect_teradata() with lazy_frame() to see simulated SQL without
connecting to a live database.
Key differences for this backend are:
Uses
TOPinstead ofLIMITSelection of user supplied translations
See vignette("translation-function") and vignette("translation-verb") for
details of overall translation technology.
Usage
dialect_teradata()
simulate_teradata()
Examples
library(dplyr, warn.conflicts = FALSE)
lf <- lazy_frame(a = TRUE, b = 1, c = 2, d = "z", con = dialect_teradata())
lf |> head()
Combine multiple lazy queries
Description
Combine multiple lazy queries into a single query using UNION ALL.
This is a convenient wrapper around purrr::reduce(tables, union_all).
Like dplyr::bind_rows() (and unlike UNION ALL), bind_queries() will
automatically align columns based on their name, and fill in any missing
columns with missing values.
Usage
bind_queries(...)
Arguments
... |
<dynamic-dots> Lazy tables to combine. |
Value
A lazy query.
Examples
lf1 <- lazy_frame(x = 1, y = "a")
lf2 <- lazy_frame(x = 2, y = "b")
bind_queries(lf1, lf2)
lf3 <- lazy_frame(y = "c", x = 3, z = 10)
bind_queries(lf2, lf3)
# If you already have a list, you can use splice operator
queries <- list(lf1, lf2)
bind_queries(!!!queries)
Build a SQL string.
Description
build_sql() is superseded in favor of sql_glue2().
This is a convenience function that should prevent sql injection attacks (which in the context of dplyr are most likely to be accidental not deliberate) by automatically escaping all expressions in the input, while treating bare strings as sql. This is unlikely to prevent any serious attack, but should make it unlikely that you produce invalid sql.
This function should be used only when generating SELECT clauses,
other high level queries, or for other syntax that has no R equivalent.
For individual function translations, prefer sql_expr().
Usage
build_sql(..., .env = parent.frame(), con = sql_current_con())
Arguments
... |
input to convert to SQL. Use |
.env |
the environment in which to evaluate the arguments. Should not be needed in typical use. |
con |
database connection; used to select correct quoting characters. |
Examples
con <- dialect_ansi()
# Old:
build_sql("SELECT * FROM ", ident("table"), con = con)
# New:
sql_glue2(con, "SELECT * FROM {.tbl 'table'}")
# Old:
name <- "Robert"
build_sql("INSERT INTO students (name) VALUES (", name, ")", con = con)
# New:
sql_glue2(con, "INSERT INTO students (name) VALUES ({name})")
Collapse a query into a subquery
Description
collapse() forces computation of a lazy query by wrapping it in a subquery.
This is not generally needed, but can be useful if you need to work around
database/dbplyr limitations.
Usage
## S3 method for class 'tbl_sql'
collapse(x, ...)
Arguments
x |
A lazy data frame backed by a database query. |
... |
Ignored. |
Examples
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(a = c(3, 4, 1, 2), b = c(5, 1, 2, NA))
db |> filter(a <= 2) |> show_query()
db |> filter(a <= 2) |> collapse() |> show_query()
Collect results into a local data frame
Description
collect() executes the query and retrieves the results into a local tibble.
This brings all the data from the database into R's memory, which is useful
once you've done as much as possible in the database, and now need to use
R functions.
Usage
## S3 method for class 'tbl_sql'
collect(
x,
...,
n = Inf,
warn_incomplete = TRUE,
sql_options = NULL,
cte = deprecated()
)
Arguments
x |
A lazy data frame backed by a database query. |
... |
Ignored. |
n |
Number of rows to fetch. Defaults to |
warn_incomplete |
Warn if |
sql_options |
SQL rendering options generated by |
cte |
Examples
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(a = c(3, 4, 1, 2), b = c(5, 1, 2, NA))
db |> filter(a <= 2) |> collect()
Complete a SQL table with missing combinations of data
Description
Turns implicit missing values into explicit missing values. This is a method
for the tidyr::complete() generic.
Usage
## S3 method for class 'tbl_lazy'
complete(data, ..., fill = list())
Arguments
data |
A lazy data frame backed by a database query. |
... |
Specification of columns to expand. See tidyr::expand for more details. |
fill |
A named list that for each variable supplies a single value to use instead of NA for missing combinations. |
Value
Another tbl_lazy. Use show_query() to see the generated
query, and use collect() to execute the query
and return data to R.
Examples
df <- memdb_frame(
group = c(1:2, 1),
item_id = c(1:2, 2),
item_name = c("a", "b", "b"),
value1 = 1:3,
value2 = 4:6
)
df |> tidyr::complete(group, nesting(item_id, item_name))
# You can also choose to fill in missing values
df |> tidyr::complete(group, nesting(item_id, item_name), fill = list(value1 = 0))
Save results into a new remote table
Description
compute() executes the query and stores the results in a new remote table.
This is useful when you want to cache intermediate results for reuse or to
improve performance by avoiding repeated computation of complex queries.
Usage
## S3 method for class 'tbl_sql'
compute(
x,
name = NULL,
temporary = TRUE,
overwrite = FALSE,
unique_indexes = list(),
indexes = list(),
analyze = TRUE,
...,
sql_options = NULL,
cte = deprecated()
)
Arguments
x |
A lazy data frame backed by a database query. |
name |
Name of new remote table. Use a string to create the table
in the current catalog/schema. Use |
temporary |
if |
overwrite |
If |
unique_indexes |
a list of character vectors. Each element of the list will create a new unique index over the specified column(s). Duplicate rows will result in failure. |
indexes |
a list of character vectors. Each element of the list will create a new index. |
analyze |
if |
... |
Ignored. |
sql_options |
SQL rendering options generated by |
cte |
Examples
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(a = c(3, 4, 1, 2), b = c(5, 1, 2, NA))
db |> filter(a <= 2) |> show_query()
db |> filter(a <= 2) |> compute() |> show_query()
Use a local data frame in a dbplyr query
Description
This is an alternative to copy_to() that does not need write access and
is faster for small data.
Usage
copy_inline(con, df, types = NULL)
Arguments
con |
A database connection. |
df |
A local data frame. The data is written directly in the SQL query so it should be small. |
types |
A named character vector of SQL data types to use for the columns.
The data types are backend specific. For example for Postgres this could
be |
Details
It writes the data directly in the SQL query via the VALUES clause.
Value
A tbl_lazy.
See Also
copy_to() to copy the data into a new database table.
Examples
df <- data.frame(x = 1:3, y = c("a", "b", "c"))
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_inline(con, df)
copy_inline(con, df) |> dplyr::show_query()
Copy a local data frame to a remote database
Description
This is an implementation of the dplyr copy_to() generic and it mostly
a wrapper around DBI::dbWriteTable().
It is useful for copying small amounts of data to a database for examples, experiments, and joins. By default, it creates temporary tables which are only visible within the current connection to the database.
Usage
## S3 method for class 'src_sql'
copy_to(
dest,
df,
name = deparse(substitute(df)),
overwrite = FALSE,
types = NULL,
temporary = TRUE,
unique_indexes = NULL,
indexes = NULL,
analyze = TRUE,
...,
in_transaction = TRUE
)
Arguments
dest |
remote data source |
df |
A local data frame, a |
name |
Name of new remote table. Use a string to create the table
in the current catalog/schema. Use |
overwrite |
If |
types |
a character vector giving variable types to use for the columns. See https://www.sqlite.org/datatype3.html for available types. |
temporary |
if |
unique_indexes |
a list of character vectors. Each element of the list will create a new unique index over the specified column(s). Duplicate rows will result in failure. |
indexes |
a list of character vectors. Each element of the list will create a new index. |
analyze |
if |
... |
other parameters passed to methods. |
in_transaction |
Should the table creation be wrapped in a transaction? This typically makes things faster, but you may want to suppress if the database doesn't support transactions, or you're wrapping in a transaction higher up (and your database doesn't support nested transactions.) |
Value
Another tbl_lazy. Use show_query() to see the generated
query, and use collect() to execute the query
and return data to R.
See Also
copy_inline() to use small data in an SQL query without actually
writing to a table.
Examples
library(dplyr, warn.conflicts = FALSE)
df <- data.frame(x = 1:5, y = letters[5:1])
db <- copy_to(memdb(), df)
db
df2 <- data.frame(y = c("a", "d"), fruit = c("apple", "date"))
# copy_to() is called automatically if you set copy = TRUE
# in the join functions
db |> left_join(df2, copy = TRUE)
Count observations by group
Description
These are methods for the dplyr dplyr::count() and tally() generics. They
wrap up group_by.tbl_lazy(), summarise.tbl_lazy() and, optionally,
arrange.tbl_lazy().
Usage
## S3 method for class 'tbl_lazy'
count(x, ..., wt = NULL, sort = FALSE, name = NULL)
## S3 method for class 'tbl_lazy'
add_count(x, ..., wt = NULL, sort = FALSE, name = NULL, .drop = NULL)
## S3 method for class 'tbl_lazy'
tally(x, wt = NULL, sort = FALSE, name = NULL)
Arguments
x |
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). |
... |
< |
wt |
<
|
sort |
If |
name |
The name of the new column in the output. If omitted, it will default to |
.drop |
Not supported for lazy tables. |
Examples
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(g = c(1, 1, 1, 2, 2), x = c(4, 3, 6, 9, 2))
db |> count(g) |> show_query()
db |> count(g, wt = x) |> show_query()
db |> count(g, wt = x, sort = TRUE) |> show_query()
Database I/O generics
Description
These generics are responsible for getting data into and out of the database. They should be used a last resort - only use them when you can't make a backend work by providing methods for DBI generics, or for dbplyr's SQL generation generics. They tend to be most needed when a backend has special handling of temporary tables.
-
db_copy_to()implementscopy_to.src_sql()by callingdb_write_table()(which callsDBI::dbWriteTable()) to transfer the data, then optionally adds indexes (viasql_table_index()) and analyses (viasql_table_analyze()). -
db_compute()implementscompute.tbl_sql()by callingsql_query_save()to create the table, then optionally adds indexes (viasql_table_index()) and analyses (viasql_table_analyze()). -
db_collect()implementscollect.tbl_sql()usingDBI::dbSendQuery()andDBI::dbFetch(). -
sql_table_temporary()is used for databases that have special naming schemes for temporary tables (e.g. SQL server and SAP HANA require temporary tables to start with#) -
db_table_drop_if_exists()is used to drop a table if it exists. This is used whenoverwrite = TRUEincopy_to()andcompute().
Usage
db_copy_to(
con,
table,
values,
...,
overwrite = FALSE,
types = NULL,
temporary = TRUE,
unique_indexes = NULL,
indexes = NULL,
analyze = TRUE,
in_transaction = TRUE
)
db_compute(
con,
table,
sql,
...,
overwrite = FALSE,
temporary = TRUE,
unique_indexes = list(),
indexes = list(),
analyze = TRUE,
in_transaction = TRUE
)
db_collect(con, sql, n = -1, warn_incomplete = TRUE, ...)
sql_table_temporary(con, table, temporary, ...)
db_table_drop_if_exists(con, table, ...)
See Also
Other generic:
db-sql,
db_connection_describe(),
escape()
Miscellaneous database generics
Description
These are used when creating a new dbplyr backend and should generally not be called directly.
-
db_connection_describe()provides a short string describing the database connection, helping users tell which database a table comes from. It should be a single line, and ideally less than 60 characters wide. -
dbplyr_edition()declares which version of the dbplyr API you want. -
db_col_types()introspects an existing table and returns a named character vector mapping column names to their database-native SQL types. The result is suitable for use as thefield.typesargument ofDBI::dbWriteTable(), allowing dbplyr to preserve column types when copying data withrows_*().Returns
NULLif the backend does not implement introspection.
Usage
db_connection_describe(con, ...)
sql_join_suffix(con, suffix, ...)
db_sql_render(con, sql, ..., cte = FALSE, sql_options = NULL)
db_col_types(con, table, call)
dbplyr_edition(con)
Arguments
con |
A database connection. |
table |
A table identifier, or |
call |
The execution environment of a currently running function, used to report errors. |
See Also
Other generic:
db-sql,
db_copy_to(),
escape()
SQL generation generics
Description
SQL translation:
-
sql_translation(con)generates a SQL translation environment. Deprecated:
sql_expr_matches(con, x, y)previously generated an alternative tox = ywhen a pair ofNULLs should match. This is now handled by theis_not_distinct_from()translation; provide a backend override there instead.Deprecated:
sql_random(con)generates SQL to get a random number which can be used to select random rows inslice_sample(). This is now replaced by adding a translation forrunif(n()).
Tables:
-
sql_table_analyze(con, table)generates SQL that "analyzes" the table, ensuring that the database has up-to-date statistics for use in the query planner. It called fromcopy_to()whenanalyze = TRUE. -
sql_table_index()generates SQL for adding an index to table.
Query manipulation:
-
sql_query_explain(con, sql)generates SQL that "explains" a query, i.e. generates a query plan describing what indexes etc that the database will use. It can return character vector, in which case the first n elements are used to generate the plan and the final element is used to return the query plan. -
sql_query_fields()generates SQL for a 0-row result that is used to capture field names intbl_sql() -
sql_query_save(con, sql)generates SQL for saving a query into a (temporary) table. -
sql_query_wrap(con, from)generates SQL for wrapping a query into a subquery.
Query indentation:
-
sql_indent_subquery(from, con, lvl)helps indenting a subquery.
Query generation:
-
sql_query_select()generates SQL for aSELECTquery -
sql_query_join()generates SQL for joins -
sql_query_semi_join()generates SQL for semi- and anti-joins -
sql_query_set_op()generates SQL forUNION,INTERSECT, andEXCEPTqueries.
Query generation for manipulation:
-
sql_query_insert()andsql_query_append()generate SQL for anINSERT FROMquery. -
sql_query_update_from()generates SQL for anUPDATE FROMquery. -
sql_query_upsert()generates SQL for anUPSERTquery. -
sql_query_delete()generates SQL for anDELETE FROMquery -
sql_returning_cols()generates SQL for aRETURNINGclause
Usage
sql_expr_matches(con, x, y, ...)
sql_translation(con)
sql_random(con)
sql_table_analyze(con, table, ...)
sql_table_index(
con,
table,
columns,
name = NULL,
unique = FALSE,
...,
call = caller_env()
)
sql_query_explain(con, sql, ...)
sql_query_fields(con, sql, ...)
sql_query_save(con, sql, name, temporary = TRUE, ...)
sql_query_wrap(con, from, name = NULL, ..., lvl = 0)
sql_indent_subquery(from, con, lvl = 0)
sql_query_rows(con, sql, ...)
sql_returning_cols(con, cols, table, ...)
sql_query_multi_join(
con,
x,
joins,
table_names,
by_list,
select,
where = NULL,
...,
distinct = FALSE,
lvl = 0
)
sql_query_join(
con,
x,
y,
select,
type = "inner",
by = NULL,
na_matches = FALSE,
...,
lvl = 0
)
sql_query_select(
con,
select,
from,
where = NULL,
group_by = NULL,
having = NULL,
window = NULL,
order_by = NULL,
limit = NULL,
distinct = FALSE,
...,
subquery = FALSE,
lvl = 0
)
sql_query_semi_join(con, x, y, anti, by, where, vars, ..., lvl = 0)
sql_set_op_method(con, op, ...)
sql_query_set_op(con, x, y, method, ..., lvl = 0)
sql_query_union(con, x, unions, ..., lvl = 0)
See Also
Other generic:
db_connection_describe(),
db_copy_to(),
escape()
Subset rows using their positions
Description
These are methods for the dplyr generics slice_min(), slice_max(), and
slice_sample(). They are translated to SQL using dplyr::filter() and
window functions (ROWNUMBER, MIN_RANK, or CUME_DIST depending on
arguments). slice(), slice_head(), and slice_tail() are not supported
since database tables have no intrinsic order.
If data is grouped, the operation will be performed on each group so that
(e.g.) slice_min(db, x, n = 3) will select the three rows with the smallest
value of x in each group.
Usage
## S3 method for class 'tbl_lazy'
slice_min(
.data,
order_by,
...,
n,
prop,
by = NULL,
with_ties = TRUE,
na_rm = TRUE
)
## S3 method for class 'tbl_lazy'
slice_max(
.data,
order_by,
...,
n,
by = NULL,
prop,
with_ties = TRUE,
na_rm = TRUE
)
## S3 method for class 'tbl_lazy'
slice_sample(.data, ..., n, prop, by = NULL, weight_by = NULL, replace = FALSE)
Arguments
.data |
A lazy data frame backed by a database query. |
order_by |
Variable or function of variables to order by. |
... |
Not used. |
n, prop |
Provide either If |
by |
< |
with_ties |
Should ties be kept together? The default, |
na_rm |
Should missing values in |
weight_by, replace |
Not supported for database backends. |
Examples
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(x = 1:3, y = c(1, 1, 2))
db |> slice_min(x) |> show_query()
db |> slice_max(x) |> show_query()
db |> slice_sample() |> show_query()
db |> group_by(y) |> slice_min(x) |> show_query()
# By default, ties are includes so you may get more rows
# than you expect
db |> slice_min(y, n = 1)
db |> slice_min(y, n = 1, with_ties = FALSE)
# Non-integer group sizes are rounded down
db |> slice_min(x, prop = 0.5)
"Uncount" a database table
Description
This is a method for the tidyr uncount() generic. It uses a temporary
table, so your database user needs permissions to create one.
Usage
dbplyr_uncount(data, weights, .remove = TRUE, .id = NULL)
Arguments
data |
A lazy data frame backed by a database query. |
weights |
A vector of weights. Evaluated in the context of |
.remove |
If |
.id |
Supply a string to create a new variable which gives a unique identifier for each created row. |
Examples
df <- memdb_frame(x = c("a", "b"), n = c(1, 2))
dbplyr_uncount(df, n)
dbplyr_uncount(df, n, .id = "id")
# You can also use constants
dbplyr_uncount(df, 2)
# Or expressions
dbplyr_uncount(df, 2 / n)
Subset distinct/unique rows
Description
This is a method for the dplyr distinct() generic. It adds the
DISTINCT clause to the SQL query.
Usage
## S3 method for class 'tbl_lazy'
distinct(.data, ..., .keep_all = FALSE)
Arguments
.data |
A lazy data frame backed by a database query. |
... |
< |
.keep_all |
If |
Value
Another tbl_lazy. Use show_query() to see the generated
query, and use collect() to execute the query
and return data to R.
Examples
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(x = c(1, 1, 2, 2), y = c(1, 2, 1, 1))
db |> distinct() |> show_query()
db |> distinct(x) |> show_query()
Perform arbitrary computation on remote backend
Description
do() is deprecated. Instead of do() you should use collect() and then
your favourite combination of purrr and dplyr functions.
Usage
## S3 method for class 'tbl_sql'
do(.data, ..., .chunk_size = 10000L)
Arguments
.data |
a tbl |
... |
Expressions to apply to each group. If named, results will be
stored in a new column. If unnamed, must return a data frame. You can
use |
.chunk_size |
The size of each chunk to pull into R. If this number is too big, the process will be slow because R has to allocate and free a lot of memory. If it's too small, it will be slow, because of the overhead of talking to the database. |
Escape/quote a value
Description
escape() turns R values into SQL literals. It implements double dispatch
via two sets of generics: first escape() dispatches on the class of x,
then that method calls sql_escape_ident(), sql_escape_logical(), etc,
which dispatch on con.
These generics translate individual values into SQL. The core
generics are DBI::dbQuoteIdentifier() and DBI::dbQuoteString()
for quoting identifiers and strings, but dbplyr needs additional
tools for inserting logical, date, date-time, and raw values into
queries.
Usage
escape(x, parens = NA, collapse = " ", con = NULL)
sql_escape_ident(con, x)
sql_escape_logical(con, x)
sql_escape_date(con, x)
sql_escape_datetime(con, x)
sql_escape_string(con, x)
sql_escape_raw(con, x)
sql_vector(x, parens = NA, collapse = " ", con = NULL)
Arguments
x |
An object to escape. Existing sql vectors will be left as is,
character vectors are escaped with single quotes, numeric vectors have
trailing |
parens, collapse |
Controls behaviour when multiple values are supplied.
Default behaviour: lists are always wrapped in parens and separated by commas, identifiers are separated by commas and never wrapped, atomic vectors are separated by spaces and wrapped in parens if needed. |
con |
A sql_dialect object or database connection. Connections are supported for backward compatibility. |
Value
A sql vector.
See Also
Other generic:
db-sql,
db_connection_describe(),
db_copy_to()
Examples
con <- dialect_ansi()
# Doubles vs. integers
escape(1:5, con = con)
escape(c(1, 5.4), con = con)
# String vs known sql vs. sql identifier
escape("X", con = con)
escape(sql("X"), con = con)
escape(ident("X"), con = con)
# Escaping is idempotent
escape("X", con = con)
escape(escape("X", con = con), con = con)
# Database specific generics
sql_escape_logical(con, c(TRUE, FALSE, NA))
sql_escape_date(con, Sys.Date())
sql_escape_date(con, Sys.time())
sql_escape_raw(con, charToRaw("hi"))
Escape ANSI characters
Description
Use escape() instead.
Usage
escape_ansi(x, parens = NA, collapse = "")
Arguments
x |
Object to escape. |
parens, collapse |
Controls parens and collapsing. Passed on to
|
Expand SQL tables to include all possible combinations of values
Description
This is a method for the tidyr::expand generics. It doesn't sort the
result explicitly, so the order might be different to what expand()
returns for data frames.
Usage
## S3 method for class 'tbl_lazy'
expand(data, ..., .name_repair = "check_unique")
Arguments
data |
A lazy data frame backed by a database query. |
... |
Specification of columns to expand. See tidyr::expand for more details. |
.name_repair |
Treatment of problematic column names:
This argument is passed on as |
Value
Another tbl_lazy. Use show_query() to see the generated
query, and use collect() to execute the query
and return data to R.
Examples
fruits <- memdb_frame(
type = c("apple", "orange", "apple", "orange", "orange", "orange"),
year = c(2010, 2010, 2012, 2010, 2010, 2012),
size = c("XS", "S", "M", "S", "S", "M"),
weights = rnorm(6)
)
# All possible combinations ---------------------------------------
fruits |> tidyr::expand(type)
fruits |> tidyr::expand(type, size)
# Only combinations that already appear in the data ---------------
fruits |> tidyr::expand(nesting(type, size))
Fill in missing values with previous or next value
Description
Fill in missing values with previous or next value
Usage
## S3 method for class 'tbl_lazy'
fill(.data, ..., .direction = c("down", "up", "updown", "downup"))
Arguments
.data |
A lazy data frame backed by a database query. |
... |
Columns to fill. |
.direction |
Direction in which to fill missing values. Currently
either "down" (the default) or "up". Note that "up" does not work when
|
Examples
library(dplyr, warn.conflicts = FALSE)
squirrels <- tibble::tribble(
~group, ~name, ~role, ~n_squirrels, ~ n_squirrels2,
1, "Sam", "Observer", NA, 1,
1, "Mara", "Scorekeeper", 8, NA,
1, "Jesse", "Observer", NA, NA,
1, "Tom", "Observer", NA, 4,
2, "Mike", "Observer", NA, NA,
2, "Rachael", "Observer", NA, 6,
2, "Sydekea", "Scorekeeper", 14, NA,
2, "Gabriela", "Observer", NA, NA,
3, "Derrick", "Observer", NA, NA,
3, "Kara", "Scorekeeper", 9, 10,
3, "Emily", "Observer", NA, NA,
3, "Danielle", "Observer", NA, NA
)
squirrels$id <- 1:12
squirrels_db <- copy_to(memdb(), squirrels)
squirrels_db |>
window_order(id) |>
tidyr::fill(n_squirrels, n_squirrels2)
Keep or drop rows that match a condition
Description
These are methods for the dplyr dplyr::filter() and dplyr::filter_out()
generics. They generate the WHERE clause of the SQL query.
filter() is translated directly to WHERE, which already matches dplyr's
behaviour of treating NA like FALSE (SQL's three-valued logic drops
NULL rows from WHERE).
filter_out() requires an additional step, where the combined condition
is wrapped in is_distinct_from(., TRUE), which is then translated using
the backend (e.g. to IS DISTINCT FROM on PostgreSQL, IS NOT on SQLite).
This ensures that the SQL translation matches dplyr's semantics.
Usage
## S3 method for class 'tbl_lazy'
filter(.data, ..., .by = NULL, .preserve = FALSE)
## S3 method for class 'tbl_lazy'
filter_out(.data, ..., .by = NULL, .preserve = FALSE)
Arguments
.data |
A lazy data frame backed by a database query. |
... |
< |
.by |
< |
.preserve |
Not supported by this method. |
Value
Another tbl_lazy. Use show_query() to see the generated
query, and use collect() to execute the query
and return data to R.
Examples
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(x = c(2, NA, 5, NA, 10), y = 1:5)
db |> filter(x < 5) |> show_query()
db |> filter_out(x < 5) |> show_query()
db |> filter(is.na(x)) |> show_query()
Extract and check the RETURNING rows
Description
get_returned_rows() extracts the RETURNING rows produced by
rows_insert(), rows_append(), rows_update(), rows_upsert(),
or rows_delete() if these are called with the returning argument.
An error is raised if this information is not available.
has_returned_rows() checks if x has stored RETURNING rows produced by
rows_insert(), rows_append(), rows_update(), rows_upsert(),
or rows_delete().
Usage
get_returned_rows(x)
has_returned_rows(x)
Arguments
x |
A lazy tbl. |
Value
For get_returned_rows(), a tibble.
For has_returned_rows(), a scalar logical.
Examples
library(dplyr)
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
DBI::dbExecute(con, "CREATE TABLE Info (
id INTEGER PRIMARY KEY AUTOINCREMENT,
number INTEGER
)")
info <- tbl(con, "Info")
rows1 <- copy_inline(con, data.frame(number = c(1, 5)))
rows_insert(info, rows1, conflict = "ignore", in_place = TRUE)
info
# If the table has an auto incrementing primary key, you can use
# the returning argument + `get_returned_rows()` its value
rows2 <- copy_inline(con, data.frame(number = c(13, 27)))
info <- rows_insert(
info,
rows2,
conflict = "ignore",
in_place = TRUE,
returning = id
)
info
get_returned_rows(info)
Group by one or more variables
Description
This is a method for the dplyr group_by() generic. It is translated to
the GROUP BY clause of the SQL query when used with
summarise() and to the PARTITION BY clause of
window functions when used with mutate().
Usage
## S3 method for class 'tbl_lazy'
group_by(.data, ..., .add = FALSE, .drop = TRUE)
Arguments
.data |
A lazy data frame backed by a database query. |
... |
< |
.add |
When |
.drop |
Not supported by this method. |
Examples
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(g = c(1, 1, 1, 2, 2), x = c(4, 3, 6, 9, 2))
db |>
group_by(g) |>
summarise(n()) |>
show_query()
db |>
group_by(g) |>
mutate(x2 = x / sum(x, na.rm = TRUE)) |>
show_query()
Subset the first rows
Description
This is a method for the head() generic. It is usually translated to the
LIMIT clause of the SQL query. Because LIMIT is not an official part of
the SQL specification, some database use other clauses like TOP or
FETCH ROWS.
Note that databases don't really have a sense of row order, so what "first"
means is subject to interpretation. Most databases will respect ordering
performed with arrange(), but it's not guaranteed. tail() is not
supported at all because the situation is even murkier for the "last" rows.
Additionally, LIMIT clauses can not generally appear in subqueries, which
means that you should use head() as late as possible in your pipelines.
Usage
## S3 method for class 'tbl_lazy'
head(x, n = 6L, ...)
Arguments
x |
A lazy data frame backed by a database query. |
n |
Number of rows to return |
... |
Not used. |
Value
Another tbl_lazy. Use show_query() to see the generated
query, and use collect() to execute the query
and return data to R.
Examples
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(x = 1:100)
db |> head() |> show_query()
# Pretend we have data in a SQL server database
db2 <- lazy_frame(x = 1:100, con = dialect_mssql())
db2 |> head() |> show_query()
Flag a character vector as SQL identifiers
Description
ident() marks strings as database identifiers (e.g. table or column names)
quoting them using the identifier rules for your database. It is used
primarily in translate_sql() to label variables as identifiers; use
elsewhere should be regarded with suspicion.
ident() is for internal use only; if you need to supply an table name that
is qualified with schema or catalog use I().
Usage
ident(...)
is.ident(x)
Arguments
... |
A character vector, or name-value pairs. |
x |
An object. |
Examples
con <- dialect_ansi()
# SQL92 quotes strings with '
escape("x", con = con)
# And identifiers with "
escape(ident("x"), con = con)
Declare a identifier as being pre-quoted.
Description
No longer needed; please use sql() instead.
Usage
ident_q(...)
Refer to a table in another schema/catalog
Description
in_schema() and in_catalog() can be used to refer to tables outside of
the current catalog/schema. However, we now recommend using I() as it's
typically less typing.
Usage
in_schema(schema, table)
in_catalog(catalog, schema, table)
Arguments
catalog, schema, table |
Names of catalog, schema, and table.
These will be automatically quoted; use |
Examples
# Previously:
in_schema("my_schema", "my_table")
in_catalog("my_catalog", "my_schema", "my_table")
in_schema(sql("my_schema"), sql("my_table"))
# Now
I("my_schema.my_table")
I("my_catalog.my_schema.my_table")
I("my_schema.my_table")
# Example using schemas with SQLite
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
# Add auxiliary schema
tmp <- tempfile()
DBI::dbExecute(con, paste0("ATTACH '", tmp, "' AS aux"))
library(dplyr, warn.conflicts = FALSE)
copy_to(con, iris, "df", temporary = FALSE)
copy_to(con, mtcars, I("aux.df"), temporary = FALSE)
con |> tbl("df")
con |> tbl(I("aux.df"))
SQL set operations
Description
These are methods for the dplyr generics dplyr::intersect(),
dplyr::union(), and dplyr::setdiff(). They are translated to
INTERSECT, UNION, and EXCEPT respectively.
Usage
## S3 method for class 'tbl_lazy'
intersect(x, y, copy = "none", ..., all = FALSE)
## S3 method for class 'tbl_lazy'
union(x, y, copy = "none", ..., all = FALSE)
## S3 method for class 'tbl_lazy'
union_all(x, y, copy = "none", ...)
## S3 method for class 'tbl_lazy'
setdiff(x, y, copy = "none", ..., all = FALSE)
Arguments
x, y |
A pair of lazy data frames backed by database queries. |
copy |
If
|
... |
Must be empty. |
all |
If |
Table paths
Description
dbplyr standardises all the ways of referring to a table (i.e. a single
string, a string wrapped in I(), a DBI::Id() and the results of
in_schema() and in_catalog()) into a table "path" of the form
table, schema.table, or catalog.schema.path. A table path is
always suitable for inlining into a query, so user input is quoted unless
it is wrapped in I().
This is primarily for internal usage, but you may need to work with it if you're implementing a backend, and you need to compute with the table path, not just pass it on unchanged to some other dbplyr function.
-
is_table_path()returnsTRUEif the object is atable_path. -
as_table_path()coerces known table identifiers to atable_path. -
check_table_path()throws an error if the object is not atable_path. -
table_path_name()returns the last component of the table path (i.e. the name of the table). -
table_path_components()returns a list containing the components of each table path.
A table_path object can technically be a vector of table paths, but
you will never see this in table paths constructed from user inputs.
Usage
is_table_path(x)
table_path_name(x, con)
table_path_components(x, con)
check_table_path(x, error_arg = caller_arg(x), error_call = caller_env())
as_table_path(x, con, error_arg = caller_arg(x), error_call = caller_env())
Join SQL tables
Description
These are methods for the dplyr join generics. They are translated to the following SQL queries:
-
inner_join(x, y):SELECT * FROM x JOIN y ON x.a = y.a -
left_join(x, y):SELECT * FROM x LEFT JOIN y ON x.a = y.a -
right_join(x, y):SELECT * FROM x RIGHT JOIN y ON x.a = y.a -
full_join(x, y):SELECT * FROM x FULL JOIN y ON x.a = y.a -
semi_join(x, y):SELECT * FROM x WHERE EXISTS (SELECT 1 FROM y WHERE x.a = y.a) -
anti_join(x, y):SELECT * FROM x WHERE NOT EXISTS (SELECT 1 FROM y WHERE x.a = y.a)
Usage
## S3 method for class 'tbl_lazy'
inner_join(
x,
y,
by = NULL,
copy = "none",
suffix = NULL,
...,
keep = NULL,
na_matches = c("never", "na"),
multiple = NULL,
unmatched = "drop",
relationship = NULL,
sql_on = NULL,
auto_index = FALSE,
x_as = NULL,
y_as = NULL
)
## S3 method for class 'tbl_lazy'
left_join(
x,
y,
by = NULL,
copy = "none",
suffix = NULL,
...,
keep = NULL,
na_matches = c("never", "na"),
multiple = NULL,
unmatched = "drop",
relationship = NULL,
sql_on = NULL,
auto_index = FALSE,
x_as = NULL,
y_as = NULL
)
## S3 method for class 'tbl_lazy'
right_join(
x,
y,
by = NULL,
copy = "none",
suffix = NULL,
...,
keep = NULL,
na_matches = c("never", "na"),
multiple = NULL,
unmatched = "drop",
relationship = NULL,
sql_on = NULL,
auto_index = FALSE,
x_as = NULL,
y_as = NULL
)
## S3 method for class 'tbl_lazy'
full_join(
x,
y,
by = NULL,
copy = "none",
suffix = NULL,
...,
keep = NULL,
na_matches = c("never", "na"),
multiple = NULL,
relationship = NULL,
sql_on = NULL,
auto_index = FALSE,
x_as = NULL,
y_as = NULL
)
## S3 method for class 'tbl_lazy'
cross_join(
x,
y,
...,
copy = "none",
suffix = c(".x", ".y"),
x_as = NULL,
y_as = NULL
)
## S3 method for class 'tbl_lazy'
semi_join(
x,
y,
by = NULL,
copy = "none",
...,
na_matches = c("never", "na"),
sql_on = NULL,
auto_index = FALSE,
x_as = NULL,
y_as = NULL
)
## S3 method for class 'tbl_lazy'
anti_join(
x,
y,
by = NULL,
copy = "none",
...,
na_matches = c("never", "na"),
sql_on = NULL,
auto_index = FALSE,
x_as = NULL,
y_as = NULL
)
Arguments
x, y |
A pair of lazy data frames backed by database queries. |
by |
A join specification created with If To join on different variables between To join by multiple variables, use a
For simple equality joins, you can alternatively specify a character vector
of variable names to join by. For example, To perform a cross-join, generating all combinations of |
copy |
If
|
suffix |
If there are non-joined duplicate variables in |
... |
Other parameters passed onto methods. |
keep |
Should the join keys from both
|
na_matches |
Should NA (NULL) values match one another?
The default, "never", is how databases usually work. |
multiple, unmatched |
Unsupported in database backends. As a workaround for multiple use a unique key and for unmatched a foreign key constraint. |
relationship |
Unsupported in database backends. |
sql_on |
A custom join predicate as an SQL expression.
Usually joins use column equality, but you can perform more complex
queries by supplying |
auto_index |
if |
x_as, y_as |
Alias to use for |
Value
Another tbl_lazy. Use show_query() to see the generated
query, and use collect() to execute the query
and return data to R.
Examples
library(dplyr, warn.conflicts = FALSE)
band_db <- copy_to(memdb(), dplyr::band_members)
instrument_db <- copy_to(memdb(), dplyr::band_instruments)
band_db |> left_join(instrument_db) |> show_query()
# Can join with local data frames by setting copy = TRUE
band_db |>
left_join(dplyr::band_instruments, copy = TRUE)
# Unlike R, joins in SQL don't usually match NAs (NULLs)
db <- memdb_frame(x = c(1, 2, NA))
label <- memdb_frame(x = c(1, NA), label = c("one", "missing"))
db |> left_join(label, by = "x")
# But you can activate R's usual behaviour with the na_matches argument
db |> left_join(label, by = "x", na_matches = "na")
# By default, joins are equijoins, but you can use `sql_on` to
# express richer relationships
db1 <- memdb_frame(x = 1:5)
db2 <- memdb_frame(x = 1:3, y = letters[1:3])
db1 |> left_join(db2) |> show_query()
db1 |> left_join(db2, sql_on = "LHS.x < RHS.x") |> show_query()
Cache and retrieve an src_sqlite of the Lahman baseball database.
Description
This creates an interesting database using data from the Lahman baseball data source, provided by Sean Lahman, and made easily available in R through the Lahman package by Michael Friendly, Dennis Murphy and Martin Monkman. See the documentation for that package for documentation of the individual tables.
Usage
lahman_sqlite(path = NULL)
lahman_postgres(dbname = "lahman", host = "localhost", ...)
lahman_mysql(dbname = "lahman", ...)
copy_lahman(con, ...)
has_lahman(type, ...)
lahman_srcs(..., quiet = NULL)
Arguments
... |
Other arguments passed to |
type |
src type. |
quiet |
if |
Examples
# Connect to a local sqlite database, if already created
library(dplyr)
if (has_lahman("sqlite")) {
lahman_sqlite()
batting <- tbl(lahman_sqlite(), "Batting")
batting
}
# Connect to a local postgres database with lahman database, if available
if (has_lahman("postgres")) {
lahman_postgres()
batting <- tbl(lahman_postgres(), "Batting")
}
Retrieve the last SQL query generated
Description
This is a helper function that retrieves the most recent SQL query generated by dbplyr, which can be useful for debugging.
Usage
last_sql()
Value
A SQL string, or NULL if no query has been generated yet.
Examples
library(dplyr, warn.conflicts = FALSE)
df <- lazy_frame(x = 1:3)
df |> filter(x > 1)
last_sql()
Create a base lazy query
Description
lazy_base_query() is a constructor for base lazy query objects. A base
lazy query represents the root of a lazy query tree, i.e. a database table or
query.
Usage
lazy_base_query(x, vars, class = character(), ...)
Arguments
x |
A data source, typically a table identifier created by |
vars |
A character vector of column names. |
class |
A character vector of additional subclasses to add. The
resulting object will have class |
... |
Additional arguments passed to |
Value
A lazy query object.
Build and render SQL from a sequence of lazy operations
Description
sql_build() creates a select_query S3 object, that is rendered
to a SQL string by sql_render(). The output from sql_build() is
designed to be easy to test, as it's database agnostic, and has
a hierarchical structure. Outside of testing, however, you should
always call sql_render().
Usage
lazy_multi_join_query(
x,
joins,
table_names,
vars,
distinct = FALSE,
where = NULL,
group_vars = op_grps(x),
order_vars = op_sort(x),
frame = op_frame(x),
call = caller_env()
)
lazy_rf_join_query(
x,
y,
type,
by,
table_names,
vars,
group_vars = op_grps(x),
order_vars = op_sort(x),
frame = op_frame(x),
call = caller_env()
)
rf_join_query(
x,
y,
select,
...,
type = "inner",
by = NULL,
suffix = c(".x", ".y"),
na_matches = FALSE
)
lazy_select_query(
x,
select = NULL,
where = NULL,
group_by = NULL,
having = NULL,
order_by = NULL,
limit = NULL,
distinct = FALSE,
group_vars = NULL,
order_vars = NULL,
frame = NULL,
select_operation = c("select", "mutate", "summarise")
)
select_query(
from,
select = sql("*"),
where = sql(),
group_by = sql(),
having = sql(),
window = sql(),
order_by = sql(),
limit = NULL,
distinct = FALSE,
from_alias = NULL
)
lazy_semi_join_query(
x,
y,
vars,
anti,
by,
where,
group_vars = op_grps(x),
order_vars = op_sort(x),
frame = op_frame(x),
call = caller_env()
)
semi_join_query(
x,
y,
vars,
anti = FALSE,
by = NULL,
where = NULL,
na_matches = FALSE
)
lazy_set_op_query(x, y, type, all, call = caller_env())
set_op_query(x, y, type)
lazy_union_query(x, unions, call = caller_env())
union_query(x, unions)
lazy_query(
query_type,
x,
...,
group_vars = op_grps(x),
order_vars = op_sort(x),
frame = op_frame(x)
)
sql_build(op, con = NULL, ..., sql_options = NULL)
sql_render(
query,
con = NULL,
...,
sql_options = NULL,
subquery = FALSE,
lvl = 0
)
Arguments
... |
Other arguments passed on to the methods. Not currently used. |
op |
A sequence of lazy operations |
con |
A sql_dialect object or database connection. Connections are
supported for backward compatibility. The default |
sql_options |
SQL rendering options generated by |
subquery |
Is this SQL going to be used in a subquery? This is important because you can place a bare table name in a subquery and ORDER BY does not work in subqueries. |
Details
sql_build() is generic over the lazy operations, lazy_ops,
and generates an S3 object that represents the query. sql_render()
takes a query object and then calls a function that is generic
over the database. For example, sql_build.op_mutate() generates
a select_query, and sql_render.select_query() calls
sql_select(), which has different methods for different databases.
The default methods should generate ANSI 92 SQL where possible, so you
backends only need to override the methods if the backend is not ANSI
compliant.
Lazy operations
Description
This set of S3 classes describe the action of dplyr verbs. These are currently used for SQL sources to separate the description of operations in R from their computation in SQL. This API is very new so is likely to evolve in the future.
op_vars() and op_grps() compute the variables and groups from
a sequence of lazy operations. op_sort() and op_frame() tracks the
order and frame for use in window functions.
Usage
op_grps(op)
op_vars(op)
op_sort(op)
op_frame(op)
A temporary in-memory database
Description
memdb() creates a temporary in-memory database that disappears when the
R session ends. It's a convenient way to learn about and experiment with
dbplyr without having to connect to a "real" database.
memdb_frame() works like tibble::tibble(), but instead of creating a new
data frame in R, it creates a table in memdb(). local_memdb_frame()
is like memdb_frame() but the table will be automatically deleted when
the current scope ends. It's useful for tests. But beware: this function
will overwrite an existing table of the same name.
Usage
memdb()
memdb_frame(.name = unique_table_name(), ...)
local_memdb_frame(.name = unique_table_name(), ..., frame = caller_env())
Arguments
.name |
Name of table in database: defaults to a random name that's unlikely to conflict with an existing table. |
... |
< Arguments are evaluated sequentially.
You can refer to previously created elements directly or using the rlang::.data
pronoun.
To refer explicitly to objects in the calling environment, use |
frame |
The created table is bound to this execution frame and will be deleted when it ends. For expert use only. |
Examples
library(dplyr)
# use memdb_frame() to create a new database table
df <- memdb_frame(x = runif(100), y = runif(100))
df |> arrange(x)
df |> arrange(x) |> show_query()
# Use memdb() + copy_to() to copy an existing data frame
iris_db <- copy_to(memdb(), iris)
iris_db
Create, modify, and delete columns
Description
These are methods for the dplyr mutate() and transmute() generics.
They are translated to computed expressions in the SELECT clause of
the SQL query.
Usage
## S3 method for class 'tbl_lazy'
mutate(
.data,
...,
.by = NULL,
.order = NULL,
.frame = NULL,
.keep = c("all", "used", "unused", "none"),
.before = NULL,
.after = NULL
)
Arguments
.data |
A lazy data frame backed by a database query. |
... |
< |
.by |
< |
.order |
< |
.frame |
A length-2 numeric vector specifying the bounds for
window function frames. The first element is the lower bound (use |
.keep |
Control which columns from
|
.before, .after |
< |
Value
Another tbl_lazy. Use show_query() to see the generated
query, and use collect() to execute the query
and return data to R.
Examples
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(x = 1:5, y = 5:1)
db |>
mutate(a = (x + y) / 2, b = sqrt(x^2L + y^2L)) |>
show_query()
# dbplyr automatically creates subqueries as needed
db |>
mutate(x1 = x + 1, x2 = x1 * 2) |>
show_query()
# `.order` and `.frame` control window functions
db <- memdb_frame(g = c(1, 1, 2, 2, 2), x = c(5, 3, 1, 4, 2))
db |>
mutate(rolling_sum = sum(x), .by = g, .order = x, .frame = c(-2, 2)) |>
show_query()
Provides comma-separated string out of the parameters
Description
Provides comma-separated string out of the parameters
Usage
named_commas(x)
Database versions of the nycflights13 data
Description
These functions cache the data from the nycflights13 database in
a local database, for use in examples and vignettes. Indexes are created
to making joining tables on natural keys efficient.
Usage
nycflights13_sqlite(path = NULL)
nycflights13_postgres(dbname = "nycflights13", ...)
has_nycflights13(type = c("sqlite", "postgres"), ...)
copy_nycflights13(con, ...)
Arguments
path |
location of SQLite database file |
dbname, ... |
Arguments passed on to |
Partially evaluate an expression.
Description
This function partially evaluates a quosure yielding an expression. It
uses information from the current tbl and the local environment to yield
a standalone expression. This simplifies SQL translation because we can just
pass around expressions rather than expressions + environments
(i.e. quosures).
Usage
partial_eval(call, data, env = caller_env(), error_call = caller_env())
Arguments
call |
an unevaluated expression, as produced by |
data |
A lazy data frame backed by a database query. |
env |
environment in which to search for local values |
Symbol substitution
partial_eval() needs to guess if you're referring to a variable on the
server (remote), or in the current environment (local). It's not possible to
do this 100% perfectly. partial_eval() uses the following heuristic:
If the tbl variables are known, and the symbol matches a tbl variable, then remote.
If the symbol is defined locally, local.
Otherwise, remote.
You can override the guesses using local() and remote() to force
computation, by using the .data and .env pronouns of tidy evaluation,
or by using dbplyr's own .sql pronoun.
Examples
lf <- lazy_frame(year = 1980, id = 1)
partial_eval(quote(year > 1980), data = lf)
ids <- c("ansonca01", "forceda01", "mathebo01")
partial_eval(quote(id %in% ids), lf)
# cf.
partial_eval(quote(id == .data$id), lf)
# You can use local() or .env to disambiguate between local and remote
# variables: otherwise remote is always preferred
year <- 1980
partial_eval(quote(year > year), lf)
partial_eval(quote(year > local(year)), lf)
partial_eval(quote(year > .env$year), lf)
# Functions are always assumed to be remote. Use local to force evaluation
# in R.
f <- function(x) x + 1
partial_eval(quote(year > f(1980)), lf)
partial_eval(quote(year > local(f(1980))), lf)
# You can use `.sql` to make it clear that the function comes from SQL,
# and inside a package, reduce the number of globalVariables() directives
# needed
partial_eval(quote(.sql$EXTRACT_YEAR(year)), lf)
Pivot data from wide to long
Description
pivot_longer() "lengthens" data, increasing the number of rows and
decreasing the number of columns. The inverse transformation is
tidyr::pivot_wider().
Learn more in vignette("pivot", "tidyr").
While most functionality is identical there are some differences to
pivot_longer() on local data frames:
the output is sorted differently/not explicitly,
the coercion of mixed column types is left to the database,
-
values_ptypesNOT supported.
Note that build_longer_spec() and pivot_longer_spec() do not work with
remote tables.
Usage
## S3 method for class 'tbl_lazy'
pivot_longer(
data,
cols,
...,
cols_vary,
names_to = "name",
names_prefix = NULL,
names_sep = NULL,
names_pattern = NULL,
names_ptypes = NULL,
names_transform = NULL,
names_repair = "check_unique",
values_to = "value",
values_drop_na = FALSE,
values_ptypes,
values_transform = NULL
)
Arguments
data |
A data frame to pivot. |
cols |
Columns to pivot into longer format. |
... |
Additional arguments passed on to methods. |
cols_vary |
Unsupported; included for compatibility with the generic. |
names_to |
A string specifying the name of the column to create
from the data stored in the column names of |
names_prefix |
A regular expression used to remove matching text from the start of each variable name. |
names_sep, names_pattern |
If |
names_ptypes |
A list of column name-prototype pairs. |
names_transform, values_transform |
A list of column name-function pairs. |
names_repair |
What happens if the output has invalid column names? |
values_to |
A string specifying the name of the column to create
from the data stored in cell values. If |
values_drop_na |
If |
values_ptypes |
Not supported. |
Details
The SQL translation basically works as follows:
split the specification by its key columns i.e. by variables crammed into the column names.
for each part in the split specification
transmute()datainto the following columns
id columns i.e. columns that are not pivotted
key columns
value columns i.e. columns that are pivotted
combine all the parts with
union_all()
Examples
# See vignette("pivot") for examples and explanation
# Simplest case where column names are character data
memdb_frame(
id = c("a", "b"),
x = 1:2,
y = 3:4
) |>
tidyr::pivot_longer(-id)
Pivot data from long to wide
Description
pivot_wider() "widens" data, increasing the number of columns and
decreasing the number of rows. The inverse transformation is
pivot_longer(). Learn more in vignette("pivot", "tidyr").
pivot_wider() on database tables comes with some caveats, please make
sure to read below for details.
Usage
## S3 method for class 'tbl_lazy'
pivot_wider(
data,
...,
id_cols = NULL,
id_expand = FALSE,
names_from = name,
names_prefix = "",
names_sep = "_",
names_glue = NULL,
names_sort = FALSE,
names_vary = "fastest",
names_expand = FALSE,
names_repair = "check_unique",
values_from = value,
values_fill = NULL,
values_fn = ~max(.x, na.rm = TRUE),
unused_fn = NULL
)
dbplyr_pivot_wider_spec(
data,
spec,
...,
names_repair = "check_unique",
id_cols = NULL,
id_expand = FALSE,
values_fill = NULL,
values_fn = ~max(.x, na.rm = TRUE),
unused_fn = NULL,
error_call = current_env()
)
Arguments
data |
A lazy data frame backed by a database query. |
... |
Unused; included for compatibility with generic. |
id_cols |
A set of columns that uniquely identifies each observation. |
id_expand |
Unused; included for compatibility with the generic. |
names_from, values_from |
A pair of
arguments describing which column (or columns) to get the name of the
output column ( If |
names_prefix |
String added to the start of every variable name. |
names_sep |
If |
names_glue |
Instead of |
names_sort |
Should the column names be sorted? If |
names_vary |
When
|
names_expand |
Should the values in the |
names_repair |
What happens if the output has invalid column names? |
values_fill |
Optionally, a (scalar) value that specifies what each
|
values_fn |
A function, the default is |
unused_fn |
Optionally, a function applied to summarize the values from
the unused columns (i.e. columns not identified by The default drops all unused columns from the result. This can be a named list if you want to apply different aggregations to different unused columns.
This is similar to grouping by the |
spec |
A specification data frame. This is useful for more complex pivots because it gives you greater control on how metadata stored in the columns become column names in the result. Must be a data frame containing character |
error_call |
The execution environment of a currently
running function, e.g. |
Caveats
pivot_wider() is eager
Note that pivot_wider() cannot be lazy because we need to look
at the data to figure out what the new column names will be.
If you have a long-running query you have two options:
Temporarily store the result of the query via
compute().Create a spec before and use
dbplyr_pivot_wider_spec()- dbplyr's version oftidyr::pivot_wider_spec().
You must supply values_fn
The big difference to pivot_wider() for local data frames is that
values_fn must not be NULL. By default it is max() which yields
the same results as for local data frames if three conditions are true:
The combination of
id_colsandvalueuniquely identify an observation.The column has a comparable type (e.g. numeric, date-time, or (for most databases) string).
-
values_fillisNULL.
If either the second or third condition is not met, you must supply a
custom values_fn. Unfortunately there is no generally available alternative
and you'll need to look for something database specific, like FIRST()
or ANY_VALUE().
How does it work?
The translation to SQL code basically works as follows:
Get unique keys in
names_fromcolumn.For each key value generate an expression of the form:
values_fn( CASE WHEN (`names from column` == `key value`) THEN (`value column`) END ) AS `output column`
Group data by id columns.
Summarise the grouped data with the expressions from step 2.
Examples
memdb_frame(
id = 1,
key = c("x", "y"),
value = 1:2
) |>
tidyr::pivot_wider(
id_cols = id,
names_from = key,
values_from = value
)
Extract a single column
Description
This is a method for the dplyr pull() generic. It evaluates the query
retrieving just the specified column.
Usage
## S3 method for class 'tbl_sql'
pull(.data, var = -1, name = NULL, ...)
Arguments
.data |
A lazy data frame backed by a database query. |
var |
A variable specified as:
The default returns the last column (on the assumption that's the column you've created most recently). This argument is taken by expression and supports quasiquotation (you can unquote column names and column locations). |
name |
An optional parameter that specifies the column to be used
as names for a named vector. Specified in a similar manner as |
... |
< |
Value
A vector of data.
Examples
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(x = 1:5, y = 5:1)
db |>
mutate(z = x + y * 2) |>
pull()
Metadata about a remote table
Description
remote_name() gives the unescaped name of the remote table, or NULL if it
is a query (created by sql()) or already escape (created by ident_q()).
remote_table() gives the remote table or the query.
remote_query() gives the text of the query, and remote_query_plan()
the query plan (as computed by the remote database). remote_src() and
remote_con() give the dplyr source and DBI connection respectively.
Usage
remote_name(x, null_if_local = TRUE)
remote_table(x, null_if_local = TRUE)
remote_src(x)
remote_con(x)
remote_query(x, sql_options = NULL, cte = deprecated())
remote_query_plan(x, ...)
Arguments
x |
Remote table, currently must be a |
null_if_local |
Return |
sql_options |
SQL rendering options generated by |
cte |
|
... |
Additional arguments passed on to methods. |
Value
-
remote_name()returns a string, orNULLif not a remote table, or not applicable. For example, computed queries do not have a "name". -
remote_table()returns asql()object containing the (correctly quoted) table identifier, or the custom query if the table was created fromsql(). ReturnsNULLif applied to a lazy query that no longer corresponds to a single table (e.g. after afilter()or a join), or if applied to a local lazy table.
Examples
mf <- memdb_frame(x = 1:5, y = 5:1, .name = "blorp")
remote_name(mf)
remote_src(mf)
remote_con(mf)
remote_query(mf)
mf2 <- dplyr::filter(mf, x > 3)
remote_name(mf2)
remote_src(mf2)
remote_con(mf2)
remote_query(mf2)
Replace NAs with specified values
Description
This is a method for the tidyr::replace_na() generic.
Usage
## S3 method for class 'tbl_lazy'
replace_na(data, replace = list(), ...)
Arguments
data |
A pair of lazy data frame backed by database queries. |
replace |
A named list of values, with one value for each column that has NA values to be replaced. |
... |
Unused; included for compatibility with generic. |
Value
Another tbl_lazy. Use show_query() to see the generated
query, and use collect() to execute the query
and return data to R.
Examples
df <- memdb_frame(x = c(1, 2, NA), y = c("a", NA, "b"))
df |> tidyr::replace_na(list(x = 0, y = "unknown"))
Edit individual rows in the underlying database table
Description
These are methods for the dplyr rows_insert(), rows_append(),
rows_update(), rows_patch(), rows_upsert(), and rows_delete()
generics.
When in_place = TRUE these verbs do not generate SELECT queries, but
instead directly modify the underlying data using INSERT, UPDATE, or
DELETE operators. This will require that you have write access to
the database: the connection needs permission to insert, modify or delete
rows, but not to alter the structure of the table.
The default, in_place = FALSE, generates equivalent lazy tables (using
SELECT queries) that allow previewing the result without actually
modifying the underlying table on the database.
Usage
## S3 method for class 'tbl_lazy'
rows_insert(
x,
y,
by = NULL,
...,
conflict = c("error", "ignore"),
copy = "none",
in_place = FALSE,
returning = NULL,
method = NULL
)
## S3 method for class 'tbl_lazy'
rows_append(x, y, ..., copy = "none", in_place = FALSE, returning = NULL)
## S3 method for class 'tbl_lazy'
rows_update(
x,
y,
by = NULL,
...,
unmatched = c("error", "ignore"),
copy = "none",
in_place = FALSE,
returning = NULL
)
## S3 method for class 'tbl_lazy'
rows_patch(
x,
y,
by = NULL,
...,
unmatched = c("error", "ignore"),
copy = "none",
in_place = FALSE,
returning = NULL
)
## S3 method for class 'tbl_lazy'
rows_upsert(
x,
y,
by = NULL,
...,
copy = "none",
in_place = FALSE,
returning = NULL,
method = NULL
)
## S3 method for class 'tbl_lazy'
rows_delete(
x,
y,
by = NULL,
...,
unmatched = c("error", "ignore"),
copy = "none",
in_place = FALSE,
returning = NULL
)
Arguments
x |
A lazy table.
For |
y |
A lazy table, data frame, or data frame extensions (e.g. a tibble). |
by |
An unnamed character vector giving the key columns. The key columns
must exist in both By default, we use the first column in |
... |
Other parameters passed onto methods. |
conflict |
For One of:
|
copy |
If
|
in_place |
Should |
returning |
Columns to return. See |
method |
A string specifying the method to use. This is only relevant for
|
unmatched |
For One of:
|
Value
A new tbl_lazy of the modified data.
With in_place = FALSE, the result is a lazy query that prints visibly,
because the purpose of this operation is to preview the results.
With in_place = TRUE, x is returned invisibly,
because the purpose of this operation is the side effect of modifying rows
in the table behind x.
Examples
library(dplyr)
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
DBI::dbExecute(con, "CREATE TABLE Ponies (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT,
cutie_mark TEXT
)")
ponies <- tbl(con, "Ponies")
applejack <- copy_inline(con, data.frame(
name = "Apple Jack",
cutie_mark = "three apples"
))
# The default behavior is to generate a SELECT query
rows_insert(ponies, applejack, conflict = "ignore")
# And the original table is left unchanged:
ponies
# You can also choose to modify the table with in_place = TRUE:
rows_insert(ponies, applejack, conflict = "ignore", in_place = TRUE)
# In this case `rows_insert()` returns nothing and the underlying
# data is modified
ponies
Subset, rename, and reorder columns using their names
Description
These are methods for the dplyr select(), rename(), and relocate()
generics. They generate the SELECT clause of the SQL query.
These functions do not support predicate functions, i.e. you can
not use where(is.numeric) to select all numeric variables.
Usage
## S3 method for class 'tbl_lazy'
select(.data, ...)
## S3 method for class 'tbl_lazy'
rename(.data, ...)
## S3 method for class 'tbl_lazy'
rename_with(.data, .fn, .cols = everything(), ...)
## S3 method for class 'tbl_lazy'
relocate(.data, ..., .before = NULL, .after = NULL)
Arguments
.data |
A lazy data frame backed by a database query. |
... |
< |
.fn |
A function used to transform the selected |
.cols |
< |
.before, .after |
< |
Examples
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(x = 1, y = 2, z = 3)
db |> select(-y) |> show_query()
db |> relocate(z) |> show_query()
db |> rename(first = x, last = z) |> show_query()
Show generated SQL and query plan
Description
show_sql() displays the SQL query that will be dispatched to the database;
explain() displays both the SQL query and the query plan.
Usage
## S3 method for class 'tbl_lazy'
show_query(x, ..., use_colour = TRUE, sql_options = NULL, cte = deprecated())
## S3 method for class 'tbl_sql'
explain(x, ...)
Arguments
x |
An object to explain |
... |
For |
use_colour |
Should the output be coloured? |
sql_options |
SQL rendering options generated by |
cte |
Literal SQL escaping
Description
Use sql() to declare that a string is literal SQL and should be used
as is, without quoting.
Usage
sql(...)
is.sql(x)
Arguments
... |
Character vectors that will be combined into a single SQL vector. |
x |
Object to check if it is an sql object. |
Examples
library(dplyr, warn.conflicts = FALSE)
# sql() just adds a class
sql("x + 1")
is.sql(sql("x + 1"))
# You can use it when you need to insert some literal SQL in a query
db <- memdb_frame(x = 1:3)
db |> mutate(y = sql("CAST(x as VARCHAR)"))
SQL dialects
Description
The dialect system allows multiple database connection classes to share SQL generation code. A dialect object encapsulates the SQL syntax rules for a particular database, independent of the connection mechanism.
-
sql_dialect()returns the dialect for a connection. For connections that haven't implemented a dialect method, returns the connection itself for backward compatibility. -
new_sql_dialect()creates a new dialect object. This is primarily intended for dbplyr backend authors.
Usage
sql_dialect(con)
new_sql_dialect(
dialect,
quote_identifier,
has_window_clause = FALSE,
has_table_alias_with_as = TRUE,
has_star_table_prefix = FALSE
)
Arguments
con |
A database connection. |
dialect |
A string giving the dialect name (e.g., "postgres", "mysql"). |
quote_identifier |
A function that quotes identifiers. Should accept a character vector and return a sql vector. |
has_window_clause |
Does the backend support named window
definitions (the |
has_table_alias_with_as |
Does the backend support using |
has_star_table_prefix |
Does the backend require table prefixes when
selecting all columns in single-table queries (e.g., |
Value
-
sql_dialect()returns a dialect object (classsql_dialect) or the connection itself for backward compatibility. -
new_sql_dialect()returns a dialect object with classc("sql_dialect_{name}", "sql_dialect").
Dispatching on dialect
For backward compatibility, all sql_ generics (and a handful of others)
call sql_dialect() on the con argument in order to dispatch further on
the dialect object, if possible:
sql_generic <- function(con, arg1, arg2, ...) {
UseMethod("sql_generic", sql_dialect(con))
}
Unfortunately, due to the way that UseMethod() works, this uses
sql_dialect(con) to control which method is selected, but still passes
the original con to the method. This means that if you are implementing
a method for a dialect and need to access dialect properties, you must call
sql_dialect(con) again inside the method.
Examples
# Create a custom dialect
my_dialect <- new_sql_dialect(
"custom",
quote_identifier = function(x) sql_quote(x, "`"),
has_window_clause = TRUE
)
class(my_dialect)
Generate SQL from R expressions
Description
sql_expr() and sql_call2() are superseded in favour of sql_glue().
Low-level building block for generating SQL from R expressions.
Strings are escaped; names become bare SQL identifiers. User infix
functions have % stripped.
Using sql_expr() in package will require use of globalVariables()
to avoid R CMD check NOTES. This is a small amount of additional pain,
which I think is worthwhile because it leads to more readable translation
code.
Usage
sql_expr(x, con = sql_current_con())
sql_call2(.fn, ..., con = sql_current_con())
Arguments
x |
A quasiquoted expression |
con |
Connection to use for escaping. Will be set automatically when called from a function translation. |
.fn |
Function name (as string, call, or symbol) |
... |
Arguments to function |
Examples
con <- dialect_ansi() # not necessary when writing translations
sql_expr(f(x + 1), con = con)
sql_expr(f("x", "y"), con = con)
sql_expr(f(x, y), con = con)
x <- ident("x")
sql_expr(f(!!x, y), con = con)
sql_expr(cast("x" %as% DECIMAL), con = con)
sql_expr(round(x) %::% numeric, con = con)
sql_call2("+", quote(x), 1, con = con)
sql_call2("+", "x", 1, con = con)
Build SQL strings with glue syntax
Description
sql_glue() and sql_glue2() are designed to help dbplyr extenders
generate custom SQL. They differ only in whether or not they require
a connection. sql_glue() retrieves the ambient connection, making it
suitable for use inside sql_translation() methods; sql_glue2() requires
a connection, making it suitable for use inside all other sql_ methods.
As the name suggests, these functions use glue syntax to make it
easy to mix fixed SQL with varying user inputs. The default glue syntax, {x},
will escape x using the database connection. If there are multiple values
in x, they'll be collapsed into a single string with ,. If you want them
to be wrapped in (), use a * suffix, e.g. {x*}.
You can also use type markers to control how the value is treated:
-
{.sql x}:xis literal SQL that should be interpolated as is, without additional escaping.xmust be a string. -
{.tbl x}:xis a table identifier like a string,I(), or one of the older forms likeDBI::Id()orin_schema(). -
{.id x}:xis a generic identifier, e.g. for a column or index.xmust be a character vector
Usage
sql_glue(sql, envir = parent.frame())
sql_glue2(con, sql, envir = parent.frame())
Arguments
sql |
A string to interpolate. |
envir |
Environment to evaluate |
con |
A sql_dialect object or database connection. Connections are supported for backward compatibility. |
Value
An SQL string.
Examples
con <- dialect_ansi()
tbl <- "my_table"
sql_glue2(con, "SELECT * FROM {.tbl tbl}")
# Values are properly escaped
name <- "Robert'); DROP TABLE students;--"
sql_glue2(con, "INSERT INTO students (name) VALUES ({name})")
# Control wrapping with *
x <- c("name", "age", "grade")
sql_glue2(con, "SELECT {.id x} FROM students")
sql_glue2(con, "SELECT * WHERE variable IN {x*}")
Optimise a SQL query
Description
No longer used.
Usage
sql_optimise(x, con = NULL, ..., subquery = FALSE)
Options for generating SQL
Description
Customise SQL generation with these three options.
Usage
sql_options(cte = FALSE, use_star = TRUE, qualify_all_columns = FALSE)
Arguments
cte |
If |
use_star |
If |
qualify_all_columns |
If |
Value
A <dbplyr_sql_options> object.
Examples
library(dplyr, warn.conflicts = FALSE)
lf1 <- lazy_frame(key = 1, a = 1, b = 2)
lf2 <- lazy_frame(key = 1, a = 1, c = 3)
result <- left_join(lf1, lf2, by = "key") |>
filter(c >= 3)
show_query(result)
sql_options <- sql_options(cte = TRUE, qualify_all_columns = TRUE)
show_query(result, sql_options = sql_options)
Generate SQL for Insert, Update, Upsert, and Delete
Description
These functions generate the SQL used in rows_*(in_place = TRUE).
Usage
sql_query_insert(
con,
table,
from,
insert_cols,
by,
...,
conflict = c("error", "ignore"),
returning_cols = NULL,
method = NULL
)
sql_query_append(con, table, from, insert_cols, ..., returning_cols = NULL)
sql_query_update_from(
con,
table,
from,
by,
update_values,
...,
returning_cols = NULL
)
sql_query_upsert(
con,
table,
from,
by,
update_cols,
...,
returning_cols = NULL,
method = NULL
)
sql_query_delete(con, table, from, by, ..., returning_cols = NULL)
Arguments
con |
A sql_dialect object or database connection. Connections are supported for backward compatibility. |
table |
Table to update. Must be a table identifier.
Use a string to refer to tables in the current schema/catalog or
|
from |
Table or query that contains the new data. Either a table identifier or SQL. |
insert_cols |
Names of columns to insert. |
by |
An unnamed character vector giving the key columns. The key columns
must exist in both By default, we use the first column in |
... |
Other parameters passed onto methods. |
conflict |
For One of:
|
returning_cols |
Optional. Names of columns to return. |
method |
Optional. The method to use. |
update_values |
A named SQL vector that specify how to update the columns. |
update_cols |
Names of columns to update. |
Details
Insert Methods
"where_not_exists"
The default for most databases.
INSERT INTO x_name SELECT * FROM y WHERE NOT EXISTS <match on by columns>
"on_conflict"
Supported by:
Postgres
SQLite
This method uses the ON CONFLICT clause and therefore requires a unique
index on the columns specified in by.
Upsert Methods
"merge"
The upsert method according to the SQL standard. It uses the MERGE statement
MERGE INTO x_name USING y ON <match on by columns> WHEN MATCHED THEN UPDATE SET ... WHEN NOT MATCHED THEN INSERT ...
"on_conflict"
Supported by:
Postgres
SQLite
This method uses the ON CONFLICT clause and therefore requires a unique
index on the columns specified in by.
"cte_update"
Supported by:
Postgres
SQLite
Oracle
The classical way to upsert in Postgres and SQLite before support for
ON CONFLICT was added. The update is done in a CTE clause and the unmatched
values are then inserted outside of the CTE.
Value
A SQL query.
Examples
sql_query_upsert(
con = dialect_postgres(),
table = "airlines",
from = "df",
by = "carrier",
update_cols = "name"
)
Helper function for quoting sql elements
Description
If the quote character is present in the string, it will be doubled.
NAs will be replaced with NULL.
Usage
sql_quote(x, quote)
Arguments
x |
Character vector to escape. |
quote |
Quote character. Either a length 1 character vector for
symmetric quotes (e.g., |
Value
A vector of sql.
Examples
sql_quote("abc", "'")
sql_quote("I've had a good day", "'")
sql_quote(c("abc", NA), "'")
sql_quote(c("abc", NA), c("[", "]"))
SQL helpers for aggregate functions
Description
These functions help you create custom aggregate SQL translations when
implementing a new backend. They are typically used within sql_translator()
to define how R aggregate functions should be translated to SQL.
-
sql_aggregate()creates a SQL aggregate function translator for functions with a single argument and an optionalna.rmparameter (e.g.,SUM(),AVG()). -
sql_aggregate_2()creates a SQL aggregate function translator for functions with exactly two arguments (e.g.,CORR(),COVAR_SAMP()). -
sql_aggregate_n()creates a SQL aggregate function translator for functions with any number of arguments and an optionalna.rmparameter (e.g.,LEAST(),GREATEST()). -
sql_not_supported()creates a function that throws an informative error when a function is not supported in SQL. -
sql_check_na_rm()is a helper that you can use in aggregate functions to direct the user towards settingna.rm = TRUE.
Usage
sql_aggregate(f, f_r = f)
sql_aggregate_2(f)
sql_aggregate_n(f, f_r = f)
sql_check_na_rm(na.rm)
sql_not_supported(f)
Arguments
f |
The name of the SQL function as a string. |
f_r |
The name of the R function being translated as a string. |
na.rm |
Logical indicating whether missing values should be removed.
In SQL, missing values are always removed in aggregate functions, so this
function will warn if |
See Also
Other SQL translation helpers:
sql_translation_scalar,
sql_translation_string,
sql_translation_window,
sql_variant()
SQL helpers for scalar functions
Description
These functions help you create custom scalar SQL translations when
implementing a new backend. They are typically used within sql_translator()
to define how R functions should be translated to SQL.
-
sql_infix()creates SQL infix operators like+,-,*,/. -
sql_prefix()creates SQL prefix functions likeABS(),SQRT(). -
sql_cast()creates SQL cast expressions likeCAST(x AS type). -
sql_try_cast()creates SQL try_cast expressions (for safe casting). -
sql_log()creates a SQL logarithm function with optional base. -
sql_cot()creates a SQL cotangent function (as1 / TAN(x)). -
sql_runif()creates a SQL expression for generating uniform random numbers.
Usage
sql_infix(f, pad = TRUE)
sql_prefix(f, n = NULL)
sql_cast(type)
sql_try_cast(type)
sql_log()
sql_cot()
sql_runif(rand_expr, n = n(), min = 0, max = 1)
Arguments
f |
The name of the SQL function as a string. |
pad |
If |
n |
For |
type |
SQL type name as a string. |
rand_expr |
A string giving an SQL expression that generates a
random number between 0 and 1, e.g. |
min, max |
Range of random values. |
See Also
Other SQL translation helpers:
sql_translation_agg,
sql_translation_string,
sql_translation_window,
sql_variant()
SQL helpers for string functions
Description
These functions help you create custom string SQL translations when
implementing a new backend. They are typically used within sql_translator()
to define how R string functions should be translated to SQL.
-
sql_substr()creates a SQL substring function translator that converts R'ssubstr(x, start, stop)to SQL'sSUBSTR(x, start, length). -
sql_str_sub()creates a SQL substring function translator that handles stringr'sstr_sub()with support for negative indices. -
sql_paste()creates a SQL paste function usingCONCAT_WS()or similar. -
sql_paste_infix()creates a SQL paste function using an infix operator like||.
Usage
sql_substr(f = "SUBSTR")
sql_str_sub(subset_f = "SUBSTR", length_f = "LENGTH", optional_length = TRUE)
sql_paste(default_sep, f = "CONCAT_WS")
sql_paste_infix(default_sep, op, cast = sql_cast("text"))
Arguments
f |
The name of the SQL function as a string. |
subset_f |
The name of the SQL substring function. |
length_f |
The name of the SQL string length function. |
optional_length |
Whether the length argument is optional in the SQL substring function. |
default_sep |
The default separator for paste operations. |
op |
The SQL operator to use for infix paste operations. |
cast |
A function to cast values to strings. |
See Also
Other SQL translation helpers:
sql_translation_agg,
sql_translation_scalar,
sql_translation_window,
sql_variant()
SQL helpers for window functions
Description
These functions help you create custom window SQL translations when
implementing a new backend. They are typically used within sql_translator()
to define how R window functions should be translated to SQL.
-
win_over()makes it easy to generate the window function specification. -
win_absent(),win_rank(),win_aggregate(), andwin_cumulative()provide helpers for constructing common types of window functions. -
win_current_group()andwin_current_order()allow you to access the grouping and order context set up bygroup_by()andarrange().
Usage
win_over(
expr,
partition = NULL,
order = NULL,
frame = NULL,
con = sql_current_con()
)
win_rank(f, empty_order = FALSE)
win_aggregate(f)
win_aggregate_2(f)
win_cumulative(f)
win_absent(f)
win_current_group()
win_current_order()
win_current_frame()
Arguments
expr |
The window expression. |
partition |
Variables to partition over. |
order |
Variables to order by. |
frame |
A numeric vector of length two defining the frame. |
con |
A sql_dialect object or database connection. Connections are supported for backward compatibility. |
f |
The name of an SQL function as a string. |
empty_order |
A logical value indicating whether to order by NULL if
|
See Also
Other SQL translation helpers:
sql_translation_agg,
sql_translation_scalar,
sql_translation_string,
sql_variant()
Examples
con <- dialect_ansi()
win_over(sql("avg(x)"), con = con)
win_over(sql("avg(x)"), "y", con = con)
win_over(sql("avg(x)"), order = "y", con = con)
win_over(sql("avg(x)"), order = c("x", "y"), con = con)
win_over(sql("avg(x)"), frame = c(-Inf, 0), order = "y", con = con)
Create an SQL translator
Description
sql_variant() creates a SQL variant, a list of translators for scalar,
aggregate, and window functions. sql_translator() creates a translator,
an environment containing R to SQL translations. When creating a backend,
you'll use these functions to customize how R functions are converted to
SQL.
Learn more in vignette("new-backend").
Usage
sql_variant(
scalar = sql_translator(),
aggregate = sql_translator(),
window = sql_translator()
)
sql_translator(..., .funs = list(), .parent = new.env(parent = emptyenv()))
base_scalar
base_agg
base_win
base_no_win
base_odbc_scalar
base_odbc_agg
base_odbc_win
Arguments
scalar, aggregate, window |
The three families of functions that an SQL variant can supply. |
..., .funs |
Named functions, used to add custom converters from standard
R functions to SQL functions. Specify individually in |
.parent |
The SQL variant that this variant should inherit from.
Defaults to |
Base translators
dbplyr provides the following base translators that implement standard SQL semantics:
-
base_scalar- scalar functions and operators -
base_agg- aggregate functions -
base_win- window functions -
base_no_win- versions of window functions that throw errors
See Also
Other SQL translation helpers:
sql_translation_agg,
sql_translation_scalar,
sql_translation_string,
sql_translation_window
Examples
# An example of adding some mappings for the statistical functions that
# postgresql provides: http://bit.ly/K5EdTn
postgres_agg <- sql_translator(.parent = base_agg,
cor = sql_aggregate_2("CORR"),
cov = sql_aggregate_2("COVAR_SAMP"),
sd = sql_aggregate("STDDEV_SAMP", "sd"),
var = sql_aggregate("VAR_SAMP", "var")
)
# Next we have to simulate a connection that uses this variant
con <- new_sql_dialect("test", quote = \(x) sql_quote(x, '"'))
sql_translation.sql_dialect_test <- function(x) {
sql_variant(
base_scalar,
postgres_agg,
base_no_win
)
}
translate_sql(cor(x, y), con = con, window = FALSE)
translate_sql(sd(income / years, na.rm = TRUE), con = con, window = FALSE)
Database src
Description
Since can generate a tbl() directly from a DBI connection we no longer
recommend using src_dbi().
Usage
src_dbi(con, auto_disconnect = FALSE)
Arguments
con |
An object that inherits from DBI::DBIConnection, typically generated by DBI::dbConnect |
auto_disconnect |
Should the connection be automatically closed when
the src is deleted? Set to |
Value
An S3 object with class src_dbi, src_sql, src.
Deprecated
Description
-
src_memdb()is deprecated; usememdb()instead. -
tbl_memdb(df)is deprecated; usecopy_to(memdb(), df)instead.
Usage
src_memdb()
tbl_memdb(df, name = deparse(substitute(df)))
Arguments
df |
Data frame to copy. |
name |
Name of table in database. |
Create a "sql src" object
Description
Deprecated: please use directly use a DBIConnection object instead.
Usage
src_sql(subclass, con, ...)
Arguments
subclass |
Name of subclass. |
con |
The connection object. |
... |
Other arguments passed on to individual methods. |
Summarise each group to one row
Description
This is a method for the dplyr summarise() generic. It generates the
SELECT clause of the SQL query, and generally needs to be combined with
group_by().
Usage
## S3 method for class 'tbl_lazy'
summarise(.data, ..., .by = NULL, .groups = NULL)
Arguments
.data |
A lazy data frame backed by a database query. |
... |
< |
.by |
< |
.groups |
When In addition, a message informs you of that choice, unless the result is ungrouped,
the option "dplyr.summarise.inform" is set to |
Value
Another tbl_lazy. Use show_query() to see the generated
query, and use collect() to execute the query
and return data to R.
Examples
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(g = c(1, 1, 1, 2, 2), x = c(4, 3, 6, 9, 2))
db |>
summarise(n()) |>
show_query()
db |>
group_by(g) |>
summarise(n()) |>
show_query()
Create a lazy query backed by a database
Description
Use tbl() to create a SQL query backed by a database. Manipulating this
object with dplyr verbs then builds up a SQL query that will only be executed
when you explicitly ask for it, either by printing the object, calling
collect() to bring the data back to R or calling compute() to create a
new table in the database. You can see the query without executing it with
show_query().
Learn more in vignette("dbplyr").
Usage
## S3 method for class 'src_dbi'
tbl(src, from, vars = NULL, ...)
Arguments
src |
A |
from |
Either a table identifier or a literal Use a string to identify a table in the current schema/catalog or
|
vars |
Optionally, provide a character vector of column names. If
not supplied, will be retrieved from the database by running a simple
query. This argument is mainly useful for better performance when creating
many |
... |
Passed on to |
Examples
library(dplyr)
# Connect to a temporary in-memory SQLite database and add some data
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con, mtcars)
# To retrieve a single table from a source, use `tbl()`
mtcars_db <- con |> tbl("mtcars")
mtcars_db
# Use `I()` for qualified table names
con |> tbl(I("temp.mtcars")) |> head(1)
# You can also pass raw SQL if you want a more sophisticated query
con |> tbl(sql("SELECT * FROM mtcars WHERE cyl = 8")) |> head(1)
# But in most cases, you'll rely on dbplyr to construct the SQL:
mtcars_db |>
filter(vs == 1) |>
summarise(mpg = mean(mpg, na.rm = TRUE), .by = cyl) |>
show_query()
Create a local lazy tibble
Description
These functions are useful for testing SQL generation without having to have an active database connection.
Usage
tbl_lazy(df, con = NULL, ..., name = "df")
lazy_frame(..., con = NULL, .name = "df")
Examples
library(dplyr)
df <- data.frame(x = 1, y = 2)
df_sqlite <- tbl_lazy(df, con = dialect_sqlite())
df_sqlite |> summarise(x = sd(x, na.rm = TRUE))
Create an SQL tbl (abstract)
Description
This function creates a lazy tbl object from a table in a database.
It's primarily intended for backend authors who need to create custom
subclasses; most users should use tbl() instead.
Usage
tbl_sql(subclass, src, from, ..., vars = NULL, check_from = deprecated())
Arguments
subclass |
name of subclass |
src |
A |
from |
Either a table identifier or a literal Use a string to identify a table in the current schema/catalog or
|
... |
needed for agreement with generic. Not otherwise used. |
vars |
Optionally, provide a character vector of column names. If
not supplied, will be retrieved from the database by running a simple
query. This argument is mainly useful for better performance when creating
many |
check_from |
Translate an expression to SQL
Description
dbplyr translates commonly used base functions including logical
(!, &, |), arithmetic (^), and comparison (!=) operators, as well
as common summary (mean(), var()), and transformation (log())
functions. All other functions will be preserved as is. R's infix functions
(e.g. %like%) will be converted to their SQL equivalents (e.g. LIKE).
Learn more in vignette("translation-function").
Usage
translate_sql(
...,
con,
vars_group = NULL,
vars_order = NULL,
vars_frame = NULL,
window = TRUE
)
translate_sql_(
dots,
con,
vars_group = NULL,
vars_order = NULL,
vars_frame = NULL,
window = TRUE,
context = list()
)
Arguments
..., dots |
Expressions to translate. |
con |
A sql_dialect object or database connection. Connections are supported for backward compatibility. |
vars_group, vars_order, vars_frame |
Parameters used in the |
window |
Use |
context |
Use to carry information for special translation cases. For example, MS SQL needs a different conversion for is.na() in WHERE vs. SELECT clauses. Expects a list. |
Examples
con <- dialect_ansi()
# Regular maths is translated in a very straightforward way
translate_sql(x + 1, con = con)
translate_sql(sin(x) + tan(y), con = con)
# Note that all variable names are escaped
translate_sql(like == "x", con = con)
# In ANSI SQL: "" quotes variable _names_, '' quotes strings
# Logical operators are converted to their sql equivalents
translate_sql(x < 5 & !(y >= 5), con = con)
# xor() doesn't have a direct SQL equivalent
translate_sql(xor(x, y), con = con)
# If is translated into case when
translate_sql(if (x > 5) "big" else "small", con = con)
# Infix functions are passed onto SQL with % removed
translate_sql(first %like% "Had%", con = con)
translate_sql(first %is% NA, con = con)
translate_sql(first %in% c("John", "Roger", "Robert"), con = con)
# And be careful if you really want integers
translate_sql(x == 1, con = con)
translate_sql(x == 1L, con = con)
# If you have an already quoted object, use translate_sql_:
x <- quote(y + 1 / sin(t))
translate_sql_(list(x), con = dialect_ansi())
# Windowed translation --------------------------------------------
# Known window functions automatically get OVER()
translate_sql(mpg > mean(mpg), con = con)
# Suppress this with window = FALSE
translate_sql(mpg > mean(mpg), window = FALSE, con = con)
# vars_group controls partition:
translate_sql(mpg > mean(mpg), vars_group = "cyl", con = con)
# and vars_order controls ordering for those functions that need it
translate_sql(cumsum(mpg), con = con)
translate_sql(cumsum(mpg), vars_order = "mpg", con = con)
Override window order and frame
Description
These allow you to override the PARTITION BY and ORDER BY clauses
of window functions generated by grouped mutates.
Usage
window_order(.data, ...)
window_frame(.data, from = -Inf, to = Inf)
Arguments
.data |
A lazy data frame backed by a database query. |
... |
Variables to order by |
from, to |
Bounds of the frame. |
Examples
library(dplyr, warn.conflicts = FALSE)
db <- memdb_frame(g = rep(1:2, each = 5), y = runif(10), z = 1:10)
db |>
window_order(y) |>
mutate(z = cumsum(y)) |>
show_query()
db |>
group_by(g) |>
window_frame(-3, 0) |>
window_order(z) |>
mutate(z = sum(y)) |>
show_query()
Override the SQL dialect for a connection
Description
with_dialect() overrides the default dialect assigned to a connection.
This is useful when dbplyr guesses the dialect incorrectly, which is most
likely to occur with ODBC/JDBC/ADBC backends.
Usage
with_dialect(con, dialect)
Arguments
con |
A database connection (class |
dialect |
A dialect object created by a |
Value
A connection object that uses the specified dialect for SQL generation.
Examples
# Wrap an in-memory SQLite connection to use Postgres dialect
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
pg_con <- with_dialect(con, dialect_postgres())
# SQL generation uses Postgres syntax
lf <- lazy_frame(x = 1, con = pg_con)
lf |> dplyr::mutate(y = sd(x))
DBI::dbDisconnect(con)