Releases: sparklyr/sparklyr
sparklyr 1.8.6
Sparklyr 1.8.6
-
Addresses issues with R 4.4.0. The root cause was that version checking functions
changed how the work.package_version()
no longer acceptsnumeric_version()
output. Wrapped
thepackage_version()
function to coerce the argument if it's a
numeric_version
class- Comparison operators (
<
,>=
, etc.) forpackageVersion()
do no longer accept numeric values.
The changes were to pass the version as a character
-
Adding support for Databricks "autoloader" (format:
cloudFiles
) for streaming ingestion of files(stream_read_cloudfiles
)(@zacdav-db #3432):stream_write_table()
stream_read_table()
-
Made changes to
stream_write_generic
(@zacdav-db #3432):toTable
method doesn't allow callingstart
, addedto_table
param that adjusts logicpath
option not propagated whento_table
isTRUE
-
Upgrades to Roxygen version 7.3.1
sparklyr 1.8.5
Fixes
Package improvements
-
Removes dependency on
tibble
, all calls are now redirected todplyr
(#3399) -
Removes dependency on
rapddirs
(#3401):- Backwards compatibility with
sparklyr
0.5 is no longer needed - Replicates selection of cache directory
- Backwards compatibility with
-
Converts
spark_apply()
to a method (#3418)
Spark improvements
-
Spark 2.3 is no longer considered maintained as of September 2019
- Removes Java folder for versions 2.3 and below
- Merges Scala file sets into Spark version 2.4
- Re-compiles JARs for version 2.4 and above
-
Updates Delta-to-Spark version matching when using
delta
as one of the
packages
when connecting (#3414)
sparklyr 1.8.4
Sparklyr 1.8.4
Compatability with new dbplyr
version
-
Fixes
db_connection_describe()
S3 consistency error (@t-kalinowski) -
Addresses new error from
dbplyr
that fails when you try to access
components from a remotetbl
using$
-
Bumps the version of
dbplyr
to switch between the two methods to create
temporary tables -
Addresses new
translate_sql()
hard requirement to pass acon
object. Done
by passing the current connection orsimulate_hive()
Fixes
-
Small fix to spark_connect_method() arguments. Removes 'hadoop_version'
-
Improvements to handling
pysparklyr
load (@t-kalinowski) -
Fixes 'subscript out of bounds' issue found by
pysparklyr
(@t-kalinowski) -
Updates available Spark download links
Improvements
-
Removes dependency on the following packages:
digest
base64enc
ellipsis
-
Converts
ml_fit()
into a S3 method forpysparklyr
compatibility
Test improvements
-
Improvements and fixes to tests (@t-kalinowski)
-
Fixes test jobs that include should have included Arrow but did not
-
Updates to the Spark versions to be tested
-
Re-adds tests for development
dbplyr
sparklyr 1.8.3
Sparklyr 1.8.3
Improvements
-
Spark error message relays are now cached instead of the entire content
displayed as an R error. This used to overwhelm the interactive session's
console or Notebook, because of the amount of lines returned by the
Spark message. Now, by default, it will return the top of the Spark
error message, which is typically the most relevant part. The full error can
still be accessed using a new function calledspark_last_error()
-
Reduces redundancy on several tests
-
Handles SQL quoting when the table reference contains multiple levels. The
common time someone would encounter an issue is when a table name is passed
usingin_catalog()
, orin_schema()
.
Java
- Adds Scala scripts to handle changes in the upcoming version of Spark (3.5)
- Adds new JAR file to handle Spark 3.0 to 3.4
- Adds new JAR file to handle Spark 3.5 and above
Fixes
-
It prevents an error when
na.rm = TRUE
is explicitly set withinpmax()
and
pmin()
. It will now also purposely fail ifna.rm
is set toFALSE
. The
default of these functions in base R is forna.rm
to beFALSE
, but ever
since these functions were released, there has been no warning or error. For now,
we will keep that behavior until a better approach can be figured out. (#3353) -
spark_install()
will now properly match when a partial version is passed
to the function. The issue was that passing '2.3' would match to '3.2.3', instead
of '2.3.x' (#3370)
Package integration
-
Adds functionality to allow other packages to provide
sparklyr
additional
back-ends. This effort is mainly focused on adding the ability to integrate
with Spark Connect and Databricks Connect through a new package. -
New exported functions to integrate with the RStudio IDE. They all have the
samespark_ide_
prefix -
Modifies several read functions to become exported methods, such as
sdf_read_column()
. -
Adds
spark_integ_test_skip()
function. This is to allow other packages
to usesparklyr
's test suite. It enables a way to the external package to
indicate if a given test should run or be skipped. -
If installed,
sparklyr
will load thepysparklyr
package
sparklyr 1.8.2
New Features
-
Adds Azure Synapse Analytics connectivity (@Bob-Chou , #3336)
-
Adds support for "parameterized" queries now available in Spark 3.4 (@gregleleu #3335)
-
Adds new DBI methods:
dbValid
anddbDisconnect
(@alibell, #3296) -
Adds
overwrite
parameter todbWriteTable()
(@alibell, #3296) -
Adds ability to turn off predicate support (where(), across()) using
options("sparklyr.support.predicates" = FALSE). Defaults to TRUE. This should
acceleratedplyr
commands because it won't need to process column types
for every single piped command
Fixes
Misc
- Switches upper version of Spark to 3.4, and updates JARS (#3334)
sparklyr 1.8.1
Bug Fixes
- Fixes consistency issues with dplyr's sample_n(), slice(), op_vars(), and sample_frac()
Internal functionality
- Adds R-devel to GHA testing
sparklyr 1.8.0
sparklyr 1.7.8
New features
-
Adds new metric extraction functions:
ml_metrics_binary()
,
ml_metrics_regression()
andml_metrics_multiclass()
. They work closer to
howyardstick
metric extraction functions work. They expect a table with
the predictions and actual values, and returns a concisetibble
with the
metrics. (#3281) -
Adds new
spark_insert_table()
function. This allows one to insert data into
an existing table definition without redefining the table, even when overwriting
the existing data. (#3272 @jimhester)
Bug Fixes
- Restores "validator" functions to regression models. Removing them in a previous
version brokeml_cross_validator()
for regression models. (#3273)
Spark
-
Adds support to Spark 3.3 local installation. This includes the ability to
enable and setup log4j version 2. (#3269) -
Updates the JSON file that
sparklyr
uses to find and download Spark for
local use. It is worth mentioning that starting with Spark 3.3, the Hadoop
version number is no longer using a minor version for its download link. So,
instead of requesting 3.2, the version to request is 3.
Internal functionality
-
Removes workaround for older versions of
arrow
. Bumpsarrow
version
dependency, from 0.14.0 to 0.17.0 (#3283 @nealrichardson) -
Removes code related to backwards compatibility with
dbplyr
.sparklyr
requiresdbplyr
version 2.2.1 or above, so the code is no longer needed.
(#3277) -
Begins centralizing ML parameter validation into a single function that will
run the propercast
function for each Spark parameter. It also starts using
S3 methods, instead of searching for a concatenated function name, to find the
proper parameter validator. Regression models are the first ones to use this
new method. (#3279) -
sparklyr
compilation routines have been improved and simplified.
spark_compile()
now provides more informative output when used. It also adds
tests to compilation to make sure. It also adds a step to install Scala in the
corresponding GHAs. This is so that the new JAR build tests are able to run.
(#3275) -
Stops using package environment variables directly. Any package level variable
will be handled by agenv
prefixed function to set and retrieve values. This
avoids the risk of having the exact same variable initialized on more than on
R script. (#3274) -
Adds more tests to improve coverage.
Misc
- Addresses new CRAN HTML check NOTEs. It also adds a new GHA action to run the
same checks to make sure we avoid new issues with this in the future.
sparklyr 1.7.6
sparklyr 1.7.5
Misc
- Addresses both CRAN Check Results warnings:
- Un-exported object
rlang::is_env()
pivot_wider()
S3 consistency issue
- Un-exported object