Skip to content

Releases: tidyverse/dbplyr

dbplyr 2.5.0

01 Apr 21:24
Compare
Choose a tag to compare

Improved tools for qualified table names

  • Specification of table names with schema/catalogs has been overhauled to
    make it simpler. This includes the following features and fixes:

    • The simplest way to refer to a qualified table is now to wrap it in
      I(), e.g. I("schema_name.table_name").

    • Use of sql() and ident_q() inside in_catalog() and in_schema()
      is once again supported (#1388).

    • It's ok to use ident_q() once again (#1413) and you should no longer
      see unsuppressable warnings about using in_schema() (#1408).

    • The names of the arguments to Id() no longer matter, only their order
      (#1416). Additionally, thanks to changes to the DBI package, you no
      longer need to name each argument.

    • If you accidentally pass a named vector to any of the database identifer
      functions, those names will be automatically stripped (#1404).

    • tbl_sql(check_from) is now deprecated.

  • dbplyr now exports some tools to work with the internal table_path class
    which is useful for certain backends that need to work with this
    data structure (#1300).

Improved SQL

  • New translations for clock functions add_years(), add_days(),
    date_build(), get_year(), get_month(), get_day(),
    and base::difftime() on SQL server, Redshift, Snowflake, and Postgres.

  • select() will keep computed columns used to arrange() subqueries that are
    eliminated by a subsequent select (@ejneer, #1437).

  • semi_join() will no longer inline away an aggregate filter (i.e. HAVING
    clause) that was followed by a select() (@ejneer, #1474)

  • Improved function translations:

    • Functions qualified with the base namespace are now also translated, e.g.
      base::paste0(x, "_1") is now translated (@mgirlich, #1022).

    • -1 + x now generates a translation instead erroring (#1420).

    • x$name never attempts to evaluate name (#1368).

    • You can once again use NULL on the LHS of an infix operator in order
      to generate SQL with unusual syntax (#1345).

    • Namespaced calls now error if the function doesn't exist, or a translation
      is not available (#1426).

    • lead() translation coerces n to an integer.

  • Databricks: now supports creating non-temporary tables too (#1418).

  • Oracle:

    • db_explain() now works (@thomashulst, #1353).

    • as.Date() works when applied to a string (#1389).

    • head() is once again translated to FETCH FIRST. This does require Oracle
      12c or newer, but it actually works, compared to the approach using
      ROWNUM from #1292 (#1436).

    • Added support for str_replace() and str_replace_all() via
      REGEXP_REPLACE() (@thomashulst, #1402).

  • Snowflake (@nathanhaigh, #1406)

    • Added support for str_starts() and str_ends() via REGEXP_INSTR()

    • Refactored str_detect() to use REGEXP_INSTR() so now supports
      regular expressions.

    • Refactored grepl() to use REGEXP_INSTR() so now supports
      case-insensitive matching through grepl(..., ignore.case = TRUE)

  • SQL server:

    • Now products a clear error if you attempt to use n_distinct() in
      mutate() or filter() (#1366).

    • filter() does a better job of converting logical vectors
      from bit to boolean (@ejneer, #1288).

  • MySQL: as.integer() gets correct translation (@krlmlr, #1375).

Minor improvements and bug fixes

  • Deprecation status of functions deprecated in previous versions (at least
    2 years old) have been advanced. In particular, src_sql() is now defunct,
    as is the use of partial_eval() with character data.

  • Database errors now show the generated SQL, which hopefully will make it
    faster to track down problems (#1401).

  • When dbplyr creates an index on a table in a schema (e.g. schema.table),
    it now only includes the table name in the index name, not the schema name.

  • The class of remote sources now includes all S4 class names, not just
    the first (#918).

  • compute() passes additional arguments all the way to
    sql_query_save()-methods (@rsund).

  • db_sql_render() correctly passes on ... when re-calling with
    sql_options set (#1394).

  • reframe() now gives an informative error that it isn't supported (#1148).

  • rows_patch(in_place = FALSE) now works when more than one column should be
    patched (@gorcha, #1443).

  • New simulate_mariadb() (@krlmlr, #1375).

  • sql_translator() now checks for duplicated definitions (@krlmlr, #1374).

dbplyr 2.4.0

26 Oct 12:25
Compare
Choose a tag to compare

Breaking changes

  • Using compute(temporary = FALSE) without providing a name is now
    deprecated (@mgirlich, #1154).

  • ntile()'s first argument has been renamed from order_by to x to
    match the interface of dplyr::ntile() (@mgirlich, #1242).

  • simulate_vars() and simulate_vars_is_typed() were removed as they weren't
    used and tidyselect now offers tidyselect_data_proxy() and
    tidyselect_data_has_predicates() (@mgirllich, #1199).

  • sql_not_supported() now expects a function name without parentheses.

  • sql_query_append(), sql_query_insert(), sql_query_update(),
    sql_query_upsert(), and sql_query_delete() changed their arguments to
    make them more consistent to the other sql_query_*() functions:

    • x_name was renamed to table.
    • y was renamed to from and must now be a table identifier or SQL instead
      of a lazy table.
    • sql_query_append() and sql_query_insert() have gained the argument cols.
  • remote_name() now returns a string with the name of the table. To get the
    qualified identifier use the newly added remote_table() (@mgirlich, #1280).

  • tbl_lazy() loses src argument after it has been deprecated for years
    (@mgirlich, #1208).

  • translate_sql() now requires the con argument (@mgirlich, #1311).
    The vars argument has been removed after it threw an error for the last 7
    years (@mgirlich).

Improved SQL

  • Preliminary databricks Spark SQL backend (#1377).

  • Joins

    • *_join() after full_join() works again (@mgirlich, #1178).

    • *_join() now allows specifying the relationship argument. It must be
      NULL or "many-to-many" (@bairdj, #1305).

    • Queries now qualify * with the table alias for better compatibility
      (@mgirlich, #1003).

    • full_join() can now handle column names that only differ in case
      (@ejneer, #1255).

    • The na_matches argument of semi_join() and anti_join() works again
      (@mgirlich, #1211).

    • A semi/anti_join() on fitlered y is inlined when possible (@mgirlich, #884).

    • Joins now work again for Pool and Oracle connections (@mgirlich, #1177, #1181).

  • A sequence of union() resp. union_all() now produces a flat query
    instead of subqueries (@mgirlich, #1269).

  • Added translations for:

  • if_any() and if_all() translations are now wrapped in parentheses.
    This makes sure it can be combined via & with other conditions
    (@mgirlich, #1153).

  • nth(), first(), and last() now support the na_rm argument
    (@mgirlich, #1193).

Minor improvements and bug fixes

  • across() now supports namespaced functions, e.g.
    across(x, dplyr::dense_rank) (@mgirlich, #1231).

  • db_copy_to(overwrite = TRUE) now actually works.

  • db_copy_to()'s ... are now passed to db_write_table() (@mgirlich, #1237).

  • Added db_supports_table_alias_with_as() to customise whether a backend
    supports specifying a table alias with AS or not (@mgirlich).

  • db_write_table() and db_save_query() gain the overwrite argument.

  • dbplyr_pivot_wider_spec() is now exported. Unlike pivot_wider() this can
    be lazy. Note that this will be removed soon after pivot_wider_spec()
    becomes a generic (@mgirlich).

  • filter()ing with window functions now generates columns called col01
    rather than q01 (@mgirlich, #1258).

  • pivot_wider() now matches tidyr NA column handling (@ejneer #1238).

  • select() can once again be used after arrange(desc(x)) (@ejneer, #1240).

  • show_query() and remote_query() gain the argument sql_options that allows
    to control how the SQL is generated. It can be created via sql_options()
    which has the following arguments:

    • cte: use common table expressions?
    • use_star: use SELECT * or explicitly select every column?
    • qualify_all_columns: qualify all columns in a join or only the ambiguous ones?
      (@mgirlich, #1146).

    Consequently the cte argument of show_query() and remote_query() has
    been deprecated (@mgirlich, #1146).

  • slice_min/max() can now order by multiple variables like dplyr, e.g. use
    slice_min(lf, tibble(x, y)) (@mgirlich, #1167).

  • slice_*() now supports the data masking pronouns .env and .data (@mgirlich, #1294).

  • sql_join_suffix() gains the argument suffix so that methods can check
    whether the suffix is valid for the backend (@mgirlich).

  • sql_random() is now deprecated. It was used to power slice_sample() which
    is now done via the translation for runif() (@mgirlich, #1200).

  • tbl() now informs when the user probably forgot to wrap the table identifier
    with in_schema() or sql() (@mgirlich, #1287).

Backend specific improvements

  • Access

  • DuckDB

    • now supports the returning argument of rows_*().
  • MySQL/MariaDB:

    • rows_update() and rows_patch() now give an informative error when the
      unsupported returning argument is used (@mgirlich, #1279).
    • rows_upsert() now gives an informative error that it isn't supported
      (@mgirlich, #1279).
    • rows_*() use the column types of x when auto copying y (@mgirlich, #1327).
    • copy_inline() now works (@mgirlich, #1188).
    • Fix translation of as.numeric(), as.POSIXct(), as_datetime(), and
      as.integer64() (@avsdev-cw, #1189).
  • MS SQL:

  • Oracle:

  • Postgres

    • The rows_*() functions now also work inside a transaction (@mgirlich, #1183).
  • SQLite

    • Subqueries now also get an alias. This makes it consistent with other
      backends and simplifies the implementation.
  • SQL Server

  • Snowflake:

    • na.rm = TRUE is now respected in pmin() and pmax() instead of being silently ignored (@fh-mthomson, #1329)
    • row_number() now works when no order is specified (@fh-mthomson, #1332)
  • Teradata

    • distinct() + head() now work (@mgirlich, #685).
    • as.Date(x) is now translate to CAST(x AS DATE) again unless x is a
      string (@mgirlich, #1285).
    • row_number() no longer defaults to partitioning by groups (now aligned with other databases when no order is specified: ROW_NUMBER() defaults to ORDER BY (SELECT NULL)) (@fh-mthomson, #1331)

dbplyr 2.3.4

27 Sep 12:25
Compare
Choose a tag to compare
  • Hot patch release to resolve R CMD check failures.

dbplyr 2.3.3

07 Jul 15:32
Compare
Choose a tag to compare
  • Hot patch release to resolve R CMD check failures.

dbplyr 2.3.2

21 Mar 20:32
Compare
Choose a tag to compare
  • Hot patch release to resolve R CMD check failures.

dbplyr 2.3.1

24 Feb 19:46
Compare
Choose a tag to compare

Breaking changes

  • window_order() now only accepts bare symbols or symbols wrapped in desc().
    This breaking change is necessary to allow select() to drop and rename
    variables used in window_order() (@mgirlich, #1103).

Improved error messages

  • quantile() and median() now error for SQL Server when used in summarise()
    and for PostgreSQL when used in mutate() as they can't be properly
    translated (@mgirlich, #1110).

  • Added an informative error for unsupported join arguments unmatched and
    multiple (@mgirlich).

  • Using predicates, e.g. where(is.integer), in across() now produces an
    error as they never worked anyway (@mgirlich, #1169).

  • Catch unsupported argument pivot_wider(id_expand = TRUE) and
    pivot_longer(cols_vary) (@mgirlich, #1109).

Bug fixes in SQL generation

  • Fixed an issue when using a window function after a summarise() and
    select() (@mgirlich, #1104).

  • Fixed an issue when there where at least 3 joins and renamed variables
    (@mgirlich, #1101).

  • mutate() and select() after distinct() now again produce a subquery to
    generate the correct translation (@mgirlich, #1119, #1141).

  • Fixed an issue when using filter() on a summarised variable (@mgirlich, #1128).

  • mutate() + filter() now again produces a new query if the mutate()
    uses a window function or SQL (@mgirlich, #1135).

  • across() and pick() can be used (again) in distinct() (@mgirlich, #1125).

  • The rows_*() function work again for tables in a schema in PostgreSQL
    (@mgirlich, #1133).

Minor improvements and bug fixes

  • sql() now evaluates its arguments locally also when used in across() (@mgirlich, #1039).

  • The rank functions (row_number(), min_rank(), rank(), dense_rank(),
    percent_rank(), and cume_dist()) now support multiple variables by
    wrapping them in tibble(), e.g. rank(tibble(x, y)) (@mgirlich, #1118).

  • pull() now supports the argument name (@mgirlich, #1136).

  • Added support for join_by() added in dplyr 1.1.0 (@mgirlich, #1074).

  • Using by = character() to perform a cross join is now soft-deprecated in
    favor of cross_join().

  • full_join() and right_join() are now translated directly to FULL JOIN
    and RIGHT JOIN for SQLite as native support was finally added (@mgirlich, #1150).

  • case_match() now works with strings on the left hand side (@mgirlich, #1143).

  • The rank functions (row_number(), min_rank(), rank(), dense_rank(),
    percent_rank(), and cume_dist()) now work again for variables wrapped in
    desc(), e.g. row_number(desc(x)) (@mgirlich, #1118).

  • Moved argument auto_index after ... in *_join() (@mgirlich, #1115).

  • Removed dependency on assertthat (@mgirlich, #1112).

  • across() now uses the original value when a column is overriden to match
    the behaviour of dplyr. For example mutate(df, across(c(x, y), ~ .x / x))
    now produces

    SELECT `x` / `x` AS `x`, `y` / `x` AS `y`
    FROM `df`
    

    instead of

    SELECT `x`, `y` / `x` AS `y`
    FROM (
      SELECT `x` / `x` AS `x`, `y`
      FROM `df`
    ) 
    

    (@mgirlich, #1015).

  • Restricted length of table aliases to avoid truncation on certain backends (e.g., Postgres) (@fh-mthomson, #1096)

dbplyr 2.3.0

16 Jan 19:06
Compare
Choose a tag to compare

New features

  • stringr::str_like() (new in 1.5.0) is translated to the closest LIKE
    equivalent (@rjpat, #509)

  • In preparation for dplyr 1.1.0:

    • The .by argument is supported (@mgirlich, #1051).
    • Passing ... to across() is deprecated because the evaluation timing
      of ... is ambiguous. Now instead of (e.g.)
      across(a:b, mean, na.rm = TRUE) use
    • pick() is translated (@mgirlich, #1044).
    • case_match() is translated (@mgirlich, #1020).
    • case_when() now supports the .default argument (@mgirlich, #1017).
  • Variables that aren't found in either the data or in the environment now
    produce an error (@mgirlich, #907).

SQL optimisation

  • dbplyr now produces fewer subqueries resulting in shorter, more readable, and,
    in some cases, faster SQL. The following combination of verbs now avoids a
    subquery if possible:

  • dbplyr now uses SELECT * after a join instead of explicitly selecting every
    column, where possible (@mgirlich, #898).

  • Joins only use the table aliases ("LHS" and "RHS") if necessary (@mgirlich).

  • When using common table expressions, the results of joins and set operations
    are now reused (@mgirlich, #978).

Improved error messages

  • Many errors have been improved and now show the function where the error
    happened instead of a helper function (@mgirlich, #907).

  • Errors produced by the database, e.g. in collect() or rows_*(), now show
    the verb where the error happened (@mgirlich).

  • window_order() now produces a better error message when applied to a data
    frame (@mgirlich, #947).

  • Using a named across() now gives a clear error message (@mgirlich, #761).

Minor improvements and bug fixes

  • Keyword highlighting can now be customised via the option dbplyr_highlight.
    Turn it off via options(dbplyr_highlight = FALSE) or pass a custom ansi
    style, e.g. options(dbplyr_highlight = cli::combine_ansi_styles("bold", "cyan"))
    (@mgirlich, #974).

  • The rank functions (row_number(), min_rank(), rank(), dense_rank(),
    percent_rank(), and cume_dist()) now give missing values the rank NA to
    match the behaviour of dplyr (@mgirlich, #991).

  • NAs in blob()s are correctly translated to NULL (#983).

  • copy_inline() gains a types argument to specify the SQL column types
    (@mgirlich, #963).

  • cur_column() is now supported (@mgirlich, #951).

  • distinct() returns columns ordered the way you request, not the same
    as the input data (@mgirlich).

  • fill() can now fill "downup" and "updown" (@mgirlich, #1057), and
    now order by non-numeric columns also in the up direction (@mgirlich, #1057).

  • filter() now works when using a window function and an external vector
    (#1048).

  • group_by() + renamed columns works once again (@mgirlich, #928).

  • last() is correctly translated when no window frame is specified
    (@mgirlich, #1063).

  • setOldClass() uses a namespace, fixing an installation issue (@mgirlich, #927).

  • sql() is now translated differently. The ... are now evaluated locally
    instead of being translated with translate_sql() (@mgirlich, #952).

Backend specific improvements

  • HANA:

    • Correctly translates as.character() (#1027).
    • copy_inline() now works for Hana (#950)
  • MySQL:

  • Oracle:

    • slice_sample() now works for Oracle (@mgirlich, #986).
    • copy_inline() now works for Oracle (#972)
  • PostgreSQL:

    • Generates correct literals for Dates (#727).
    • str_flatten() uses collapse = "" by default (@fh-afrachioni, #993)
    • rows_*() use the column types of x when auto copying (@mgirlich, #909).
  • Redshift:

  • Snowflake:

  • numeric functions: all(), any(), log10(), round(), cor(), cov()
    and sd().

  • date functions: day(), mday(), wday(), yday(), week(),
    isoweek(), month(), quarter(), isoyear(), seconds(), minutes(),
    hours(), days(), weeks(), months(), years() and floor_date().

  • string functions: grepl(), paste(), paste0(), str_c(), str_locate(),
    str_detect(), str_replace(), str_replace_all(), str_remove(),
    str_remove_all(), str_trim(), str_squish() and str_flatten()
    (@fh-afrachioni, #860).

  • str_flatten() uses collapse = "" by default (@fh-afrachioni, #993)

  • SQLite:

    • quantile() gives a better error saying that it is not supported
      (@mgirlich, #1000).
  • SQL server:

    • as.POSIXct() now translated correctly (@krlmlr, #1011).
    • median() now translated correctly (#1008).
    • pivot_wider() works again for MS SQL (@mgirlich, #929).
    • Always use 1 and 0 as literals for logicals (@krlmlr, #934).
  • Teradata:

    • Querying works again. Unfortunately, the fix requires every column to
      once again by explicitly selected (@mgirlich, #966).
    • New translations for as.Date(), week(), quarter(), paste(),
      startsWith(), row_number(), weighted.mean(), lead(), lag(), and
      cumsum() (@overmar, #913).

dbplyr 2.2.1

27 Jun 16:45
c71736e
Compare
Choose a tag to compare
  • Querying Oracle databases works again. Unfortunately, the fix requires every
    column to be explicitly selected again (@mgirlich, #908).

  • semi_join() and anti_join() work again for Spark (@mgirlich, #915).

  • str_c() is now translated to || in Oracle (@mgirlich, #921).

  • sd(), var(), cor() and cov() now give clear error messages on
    databases that don't support them.

  • any() and all() gain default translations for all backends.

dbplyr 2.2.0

06 Jun 16:00
Compare
Choose a tag to compare

dbplyr 2.2.0

New features

  • SQL formatting has been considerably improved with new wrapping and indenting.
    show_query() creates more readable queries by printing the keywords in blue
    (@mgirlich, #644). When possible dbplyr now uses SELECT * instead of
    explicitly selecting every column (@mgirlich).

  • Added support for rows_insert(), rows_append(), rows_update(),
    rows_patch(), rows_upsert(), and rows_delete() (@mgirlich, #736).

  • Added copy_inline() as a copy_to() equivalent that does not need write
    access (@mgirlich, #628).

  • remote_query(), show_query(), compute() and collect() have an
    experimental cte argument. If TRUE the SQL query will use common table
    expressions instead of nested queries (@mgirlich, #638).

  • New in_catalog(), which works like in_schema(), but allows creation of
    table identifiers consisting of three components: catalog, schema, name
    (#806, @krlmlr).

Improvements to SQL generation

  • When possible, dbplyr now uses SELECT * instead of explicitly selecting
    every column (@mgirlich).

  • New translation for cut() (@mgirlich, #697).

  • Improved translations for specific backends:

  • The backend function dbplyr_fill0() (used for databases that lack
    IGNORE NULLS support) now respects database specific translations
    (@rsund, #753).

  • Calls of the form stringr::foo() or lubridate::foo() are now evaluated in
    the database, rather than locally (#197).

  • Unary plus (e.g. db %>% filter(x == +1)) now works (@mgirlich, #674).

  • is.na(), ifelse(), if_else(), case_when(), and if()
    generate slightly more compact SQL (@mgirlich, #738).

  • if_else() now supports the missing argument (@mgirlich, #641).

  • n() now respects the window frame (@mgirlich, #700).

  • quantile() no longer errors when using the na.rm argument (@mgirlich, #600).

  • remote_name() now returns a name in more cases where it makes sense
    (@mgirlich, #850).

  • The partial evaluation code is now more aligned with dtplyr. This makes it
    easier to transfer bug fixes and new features from one package to the other.
    In this process the second argument of partial_eval() was changed to a lazy
    frame instead of a character vector of variables (@mgirlich, #766).
    Partially evaluated expressions with infix operations are now correctly
    translated. For example translate_sql(!!expr(2 - 1) * x) now works
    (@mgirlich, #634).

Minor improvements and bug fixes

  • New pillar::tbl_format_header() method for lazy tables: Printing a lazy
    table where all rows are displayed also shows the exact number of rows in the
    header. The threshold is controlled by getOption("pillar.print_min"),
    with a default of 10 (#796, @krlmlr).

  • The 1st edition extension mechanism is formally deprecated (#507).

  • across(), if_any() and if_all() now defaults to .cols = everything()
    (@mgirlich, #760). If .fns is not provided if_any() and if_all() work
    like a parallel version of any()/any() (@mgirlich, #734).

  • across(), if_any(), and if_all() can now translate evaluated lists
    and functions (@mgirlich, #796), and accept the name of a list of functions
    (@mgirlich, #817).

  • Multiple across() calls in mutate() and transmute() can now access
    freshly created variables (@mgirlich, #802).

  • add_count() now doesn't change the groups of the input (@mgirlich, #614).

  • compute() can now handle when name is named by unnaming it first
    (@mgirlich, #623), and now works when temporary = TRUE for Oracle
    (@mgirlich, #621).

  • distinct() now supports .keep_all = TRUE (@mgirlich, #756).

  • expand() now works in DuckDB (@mgirlich, #712).

  • explain() passes ... to methods (@mgirlich, #783), and
    works for Redshift (@mgirlich, #740).

  • filter() throws an error if you supply a named argument (@mgirlich, #764).

  • Joins disambiguates columns that only differ in case (@mgirlich, #702).
    New arguments x_as and y_as allow you to control the table alias
    used in SQL query (@mgirlich, #637). Joins with na_matches = "na" now work
    for DuckDB (@mgirlich, #704).

  • mutate() and transmute() use named windows if a window definition is
    used at least twice and the backend supports named windows (@mgirlich, #624).

  • mutate() now supports the arguments .keep, .before, and .after
    (@mgirlich, #802).

  • na.rm = FALSE only warns once every 8 hours across all functions (#899).

  • nesting() now supports the .name_repair argument (@mgirlich, #654).

  • pivot_longer() can now pivot a column named name (@mgirlich, #692),
    can repair names (@mgirlich, #694), and can work with multiple names_from
    columns (@mgirlich, #693).

  • pivot_wider(values_fn = ) and pivot_longer(values_transform = )
    can now be formulas (@mgirlich, #745).

  • pivot_wider() now supports the arguments names_vary, names_expand, and
    unused_fn (@mgirlich, #774).

  • remote_name() now returns a name in more cases where it makes sense
    (@mgirlich, #850).

  • sql_random() is now exported.

  • ungroup() removes variables in ... from grouping (@mgirlich, #689).

  • transmute() now keeps grouping variables (@mgirlich, #802).

dbplyr 2.1.1

06 Apr 12:56
Compare
Choose a tag to compare
  • New support for Snowflake (@edgararuiz)

  • compute(), sql_table_index(), and sql_query_wrap() now work with
    schemas (@mgirlich, #595).

  • if_any() and if_all() are now translated.

  • group_by() now ungroups when the dots argument is empty and .add is FALSE
    (@mgirlich, #615).

  • sql_escape_date() and sql_escape_datetime gain methods for MS Access
    (@erikvona, #608).