Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix descending sort for character data #92

Open
krlmlr opened this issue Feb 4, 2024 · 4 comments · May be fixed by #175
Open

Fix descending sort for character data #92

krlmlr opened this issue Feb 4, 2024 · 4 comments · May be fixed by #175
Labels
feature a feature request or enhancement

Comments

@krlmlr
Copy link
Collaborator

krlmlr commented Feb 4, 2024

options(conflicts.policy = list(warn = FALSE))
library(duckplyr)

Sys.setenv("DUCKPLYR_FORCE" = "TRUE")

data.frame(a = letters[1:3]) |>
  as_duckplyr_df() |>
  arrange(desc(a))
#> Error: Binder Error: No function matches the given name and argument types '-(VARCHAR)'. You might need to add explicit type casts.
#>  Candidate functions:
#>  -(TINYINT) -> TINYINT
#>  -(TINYINT, TINYINT) -> TINYINT
#>  -(SMALLINT) -> SMALLINT
#>  -(SMALLINT, SMALLINT) -> SMALLINT
#>  -(INTEGER) -> INTEGER
#>  -(INTEGER, INTEGER) -> INTEGER
#>  -(BIGINT) -> BIGINT
#>  -(BIGINT, BIGINT) -> BIGINT
#>  -(HUGEINT) -> HUGEINT
#>  -(HUGEINT, HUGEINT) -> HUGEINT
#>  -(FLOAT) -> FLOAT
#>  -(FLOAT, FLOAT) -> FLOAT
#>  -(DOUBLE) -> DOUBLE
#>  -(DOUBLE, DOUBLE) -> DOUBLE
#>  -(DECIMAL) -> DECIMAL
#>  -(DECIMAL, DECIMAL) -> DECIMAL
#>  -(UTINYINT) -> UTINYINT
#>  -(UTINYINT, UTINYINT) -> UTINYINT
#>  -(USMALLINT) -> USMALLINT
#>  -(USMALLINT, USMALLINT) -> USMALLINT
#>  -(UINTEGER) -> UINTEGER
#>  -(UINTEGER, UINTEGER) -> UINTEGER
#>  -(UBIGINT) -> UBIGINT
#>  -(UBIGINT, UBIGINT) -> UBIGINT
#>  -(DATE, DATE) -> BIGINT
#>  -(DATE, INTEGER) -> DATE
#>  -(TIMESTAMP, TIMESTAMP) -> INTERVAL
#>  -(INTERVAL, INTERVAL) -> INTERVAL
#>  -(DATE, INTERVAL) -> DATE
#>  -(TIME, INTERVAL) -> TIME
#>  -(TIMESTAMP, INTERVAL) -> TIMESTAMP
#>  -(INTERVAL) -> INTERVAL

Created on 2024-02-04 with reprex v2.1.0

@krlmlr krlmlr added the feature a feature request or enhancement label Feb 4, 2024
@toppyy
Copy link

toppyy commented Apr 30, 2024

I'm new to the project, but had a look at this and found the following:

  • Currently sorting in descending order relies on manipulating the values being sorted, not defining the sort order. Specifically desc is defined as a macro that negates the value of it's parameters before being sorted in an ascending order.
  • While the above is fine for numeric types, this issue arises because you cannot negate varchar-type
  • If I understood correctly, arrange ultimately calls rapi_rel_order in the duckdb-r -package, which creates OrderByNode:s for duckdb to process. Said nodes have an ascending sort order as a hardcoded value

Possible solutions:

  1. Use the current solution and define an macro that alters varchars so that they are sorted correctly. However, I cannot think of any way to do this?
  2. Extract and pass the sort order of each expression in sort to rapi_rel_order and use that information to create OrderByNodes with the correct sort order
    • Basically means that if the expression is not wrapped in a desc-function call, the sort order is ascending?
    • It is not clear how to pass the information? I don't know if we can pass it within the duckdb expressions or would this require changing the signature of rapi_rel_order.
  3. Something completely different?

@krlmlr
Copy link
Collaborator Author

krlmlr commented May 5, 2024

Thanks. Your second solution sounds like the way to go. Would you like to take a stab?

@toppyy
Copy link

toppyy commented May 7, 2024

Sure, I'd like to give it a go.

ATM I plan to:

  1. Create a PR for duckdb-r -package that changes the signature of rapi_rel_order so that it can accept a vector of boolean values describing the sort order (if true, sort ascending)
    • The vector defaults to true for all expressions
  2. After 1. is merged to duckdb-r -package, tackle this issue by altering rel_order so that it passes the sort order of each expression to rapi_rel_order

@krlmlr
Copy link
Collaborator Author

krlmlr commented May 7, 2024

Thanks, I forgot how much work this was. Happy to support you here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants