sparklyr arrange behavior is different than dplyr arrange - order null values after non-null values #3368

mohanrajdtv · 2023-08-22T03:41:30Z

sparklyr arrange behavior is different than dplyr arrange.

dplyr arrange places missing value or null after all non-null values (https://dplyr.tidyverse.org/reference/arrange.html#missing-values)

Sparklyr arrange does not place null values in the end as shown in the generate query below

index_conditions_sparklyr <- index_conditions %>%
arrange(PersonId, RecordedDateTime, EncounterId) %>%
group_by(PersonId) %>%
mutate(row_number = row_number()) %>%
filter(row_number == 1) %>%
show_query()

SELECT *
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY PersonId ORDER BY PersonId, RecordedDateTime, EncounterId) AS row_number
FROM table_1692663744
) q01
WHERE (row_number = 1.0)

SparkSQL supports ordering null values after non-null values
https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-orderby.html

Example:
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY PersonId ORDER BY RecordedDateTime NULLS LAST, EncounterId NULLS LAST) AS row_number FROM tbl_index_conditions) t
WHERE row_number = 1

Pyspark supports this - https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.Column.asc_nulls_last.html

Requesting to support ordering null values after non-null values in arrange method

The text was updated successfully, but these errors were encountered:

edgararuiz added the enhancement label Aug 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sparklyr arrange behavior is different than dplyr arrange - order null values after non-null values #3368

sparklyr arrange behavior is different than dplyr arrange - order null values after non-null values #3368

mohanrajdtv commented Aug 22, 2023

sparklyr arrange behavior is different than dplyr arrange - order null values after non-null values #3368

sparklyr arrange behavior is different than dplyr arrange - order null values after non-null values #3368

Comments

mohanrajdtv commented Aug 22, 2023