Skip to content

Releases: sparklyr/sparklyr

sparklyr 1.8.6

01 May 17:19
286a7d7
Compare
Choose a tag to compare

Sparklyr 1.8.6

  • Addresses issues with R 4.4.0. The root cause was that version checking functions
    changed how the work.

    • package_version() no longer accepts numeric_version() output. Wrapped
      the package_version() function to coerce the argument if it's a
      numeric_version class
    • Comparison operators (<, >=, etc.) for packageVersion() do no longer accept numeric values.
      The changes were to pass the version as a character
  • Adding support for Databricks "autoloader" (format: cloudFiles) for streaming ingestion of files(stream_read_cloudfiles)(@zacdav-db #3432):

    • stream_write_table()
    • stream_read_table()
  • Made changes to stream_write_generic (@zacdav-db #3432):

    • toTable method doesn't allow calling start, added to_table param that adjusts logic
    • path option not propagated when to_table is TRUE
  • Upgrades to Roxygen version 7.3.1

sparklyr 1.8.5

26 Mar 12:02
Compare
Choose a tag to compare

Fixes

  • Fixes quoting issue with dbplyr 2.5.0 (#3429)

  • Fixes Windows OS identification (#3426)

Package improvements

  • Removes dependency on tibble, all calls are now redirected to dplyr (#3399)

  • Removes dependency on rapddirs (#3401):

    • Backwards compatibility with sparklyr 0.5 is no longer needed
    • Replicates selection of cache directory
  • Converts spark_apply() to a method (#3418)

Spark improvements

  • Spark 2.3 is no longer considered maintained as of September 2019

    • Removes Java folder for versions 2.3 and below
    • Merges Scala file sets into Spark version 2.4
    • Re-compiles JARs for version 2.4 and above
  • Updates Delta-to-Spark version matching when using delta as one of the
    packages when connecting (#3414)

sparklyr 1.8.4

30 Oct 15:17
9fe4405
Compare
Choose a tag to compare

Sparklyr 1.8.4

Compatability with new dbplyr version

  • Fixes db_connection_describe() S3 consistency error (@t-kalinowski)

  • Addresses new error from dbplyr that fails when you try to access
    components from a remote tbl using $

  • Bumps the version of dbplyr to switch between the two methods to create
    temporary tables

  • Addresses new translate_sql() hard requirement to pass a con object. Done
    by passing the current connection or simulate_hive()

Fixes

  • Small fix to spark_connect_method() arguments. Removes 'hadoop_version'

  • Improvements to handling pysparklyr load (@t-kalinowski)

  • Fixes 'subscript out of bounds' issue found by pysparklyr (@t-kalinowski)

  • Updates available Spark download links

Improvements

  • Removes dependency on the following packages:

    • digest
    • base64enc
    • ellipsis
  • Converts ml_fit() into a S3 method for pysparklyr compatibility

Test improvements

  • Improvements and fixes to tests (@t-kalinowski)

  • Fixes test jobs that include should have included Arrow but did not

  • Updates to the Spark versions to be tested

  • Re-adds tests for development dbplyr

sparklyr 1.8.3

05 Sep 13:12
Compare
Choose a tag to compare

Sparklyr 1.8.3

Improvements

  • Spark error message relays are now cached instead of the entire content
    displayed as an R error. This used to overwhelm the interactive session's
    console or Notebook, because of the amount of lines returned by the
    Spark message. Now, by default, it will return the top of the Spark
    error message, which is typically the most relevant part. The full error can
    still be accessed using a new function called spark_last_error()

  • Reduces redundancy on several tests

  • Handles SQL quoting when the table reference contains multiple levels. The
    common time someone would encounter an issue is when a table name is passed
    using in_catalog(), or in_schema().

Java

  • Adds Scala scripts to handle changes in the upcoming version of Spark (3.5)
  • Adds new JAR file to handle Spark 3.0 to 3.4
  • Adds new JAR file to handle Spark 3.5 and above

Fixes

  • It prevents an error when na.rm = TRUE is explicitly set within pmax() and
    pmin(). It will now also purposely fail if na.rm is set to FALSE. The
    default of these functions in base R is for na.rm to be FALSE, but ever
    since these functions were released, there has been no warning or error. For now,
    we will keep that behavior until a better approach can be figured out. (#3353)

  • spark_install() will now properly match when a partial version is passed
    to the function. The issue was that passing '2.3' would match to '3.2.3', instead
    of '2.3.x' (#3370)

Package integration

  • Adds functionality to allow other packages to provide sparklyr additional
    back-ends. This effort is mainly focused on adding the ability to integrate
    with Spark Connect and Databricks Connect through a new package.

  • New exported functions to integrate with the RStudio IDE. They all have the
    same spark_ide_ prefix

  • Modifies several read functions to become exported methods, such as
    sdf_read_column().

  • Adds spark_integ_test_skip() function. This is to allow other packages
    to use sparklyr's test suite. It enables a way to the external package to
    indicate if a given test should run or be skipped.

  • If installed, sparklyr will load the pysparklyr package

sparklyr 1.8.2

01 Jul 18:44
Compare
Choose a tag to compare

New Features

  • Adds Azure Synapse Analytics connectivity (@Bob-Chou , #3336)

  • Adds support for "parameterized" queries now available in Spark 3.4 (@gregleleu #3335)

  • Adds new DBI methods: dbValid and dbDisconnect (@alibell, #3296)

  • Adds overwrite parameter to dbWriteTable() (@alibell, #3296)

  • Adds database parameter to dbListTables() (@alibell, #3296)

  • Adds ability to turn off predicate support (where(), across()) using
    options("sparklyr.support.predicates" = FALSE). Defaults to TRUE. This should
    accelerate dplyr commands because it won't need to process column types
    for every single piped command

Fixes

  • Fixes Spark download locations (#3331)

  • Fix various rlang deprecation warnings (@mgirlich, #3333).

Misc

  • Switches upper version of Spark to 3.4, and updates JARS (#3334)

sparklyr 1.8.1

22 Mar 14:45
38f8bcf
Compare
Choose a tag to compare

Bug Fixes

  • Fixes consistency issues with dplyr's sample_n(), slice(), op_vars(), and sample_frac()

Internal functionality

  • Adds R-devel to GHA testing

sparklyr 1.8.0

21 Mar 19:07
Compare
Choose a tag to compare

Bug Fixes

  • Addresses Warning from CRAN checks

  • Addresses option(stringsAsFactors) usage

  • Fixes root cause of issue processing pivot wider and distinct (#3317 & #3320)

  • Updates local Spark download sources

sparklyr 1.7.8

16 Aug 20:45
2cc7e04
Compare
Choose a tag to compare

New features

  • Adds new metric extraction functions: ml_metrics_binary(),
    ml_metrics_regression() and ml_metrics_multiclass(). They work closer to
    how yardstick metric extraction functions work. They expect a table with
    the predictions and actual values, and returns a concise tibble with the
    metrics. (#3281)

  • Adds new spark_insert_table() function. This allows one to insert data into
    an existing table definition without redefining the table, even when overwriting
    the existing data. (#3272 @jimhester)

Bug Fixes

  • Restores "validator" functions to regression models. Removing them in a previous
    version broke ml_cross_validator() for regression models. (#3273)

Spark

  • Adds support to Spark 3.3 local installation. This includes the ability to
    enable and setup log4j version 2. (#3269)

  • Updates the JSON file that sparklyr uses to find and download Spark for
    local use. It is worth mentioning that starting with Spark 3.3, the Hadoop
    version number is no longer using a minor version for its download link. So,
    instead of requesting 3.2, the version to request is 3.

Internal functionality

  • Removes workaround for older versions of arrow. Bumps arrow version
    dependency, from 0.14.0 to 0.17.0 (#3283 @nealrichardson)

  • Removes code related to backwards compatibility with dbplyr. sparklyr
    requires dbplyr version 2.2.1 or above, so the code is no longer needed.
    (#3277)

  • Begins centralizing ML parameter validation into a single function that will
    run the proper cast function for each Spark parameter. It also starts using
    S3 methods, instead of searching for a concatenated function name, to find the
    proper parameter validator. Regression models are the first ones to use this
    new method. (#3279)

  • sparklyr compilation routines have been improved and simplified.
    spark_compile() now provides more informative output when used. It also adds
    tests to compilation to make sure. It also adds a step to install Scala in the
    corresponding GHAs. This is so that the new JAR build tests are able to run.
    (#3275)

  • Stops using package environment variables directly. Any package level variable
    will be handled by a genv prefixed function to set and retrieve values. This
    avoids the risk of having the exact same variable initialized on more than on
    R script. (#3274)

  • Adds more tests to improve coverage.

Misc

  • Addresses new CRAN HTML check NOTEs. It also adds a new GHA action to run the
    same checks to make sure we avoid new issues with this in the future.

sparklyr 1.7.6

27 May 14:17
Compare
Choose a tag to compare
  • Ensures compatibility with Spark version 3.2 (#3261)
  • Compatibility with new dbplyr version (@mgirlich)
  • Removes stringr dependency
  • Fixes augment() when the model was fitted via parsnip (#3233)

sparklyr 1.7.5

03 Feb 15:35
Compare
Choose a tag to compare

Misc

  • Addresses both CRAN Check Results warnings:
    • Un-exported object rlang::is_env()
    • pivot_wider() S3 consistency issue