You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SELECT *
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY PersonId ORDER BY PersonId, RecordedDateTime, EncounterId) AS row_number
FROM table_1692663744
) q01
WHERE (row_number = 1.0)
Example:
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY PersonId ORDER BY RecordedDateTime NULLS LAST, EncounterId NULLS LAST) AS row_number FROM tbl_index_conditions) t
WHERE row_number = 1
sparklyr arrange behavior is different than dplyr arrange.
dplyr arrange places missing value or null after all non-null values (https://dplyr.tidyverse.org/reference/arrange.html#missing-values)
Sparklyr arrange does not place null values in the end as shown in the generate query below
index_conditions_sparklyr <- index_conditions %>%
arrange(PersonId, RecordedDateTime, EncounterId) %>%
group_by(PersonId) %>%
mutate(row_number = row_number()) %>%
filter(row_number == 1) %>%
show_query()
SELECT *
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY
PersonId
ORDER BYPersonId
,RecordedDateTime
,EncounterId
) ASrow_number
FROM
table_1692663744
)
q01
WHERE (
row_number
= 1.0)SparkSQL supports ordering null values after non-null values
https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-orderby.html
Example:
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY PersonId ORDER BY RecordedDateTime NULLS LAST, EncounterId NULLS LAST) AS row_number FROM tbl_index_conditions) t
WHERE row_number = 1
Pyspark supports this - https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.Column.asc_nulls_last.html
Requesting to support ordering null values after non-null values in arrange method
The text was updated successfully, but these errors were encountered: