Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sdf_pivot_longer breaks when using names_sep or names_pattern (multiple names_to) #3417

Open
ahcyip opened this issue Feb 1, 2024 · 0 comments

Comments

@ahcyip
Copy link

ahcyip commented Feb 1, 2024

I'm trying to sdf_pivot_longer in spark, with a length(names_to) > 1.

pivot_longer(
cols = c("wkday_home1", "wkday_home2", "wkday_work1", "wkday_work2",
"wkday_public2", "wkday_publicdcfc" , "wkday_enroutedcfc",
"wknd_home1" , "wknd_home2" , "wknd_work1" ,
"wknd_work2" , "wknd_public2" , "wknd_publicdcfc" ,
"wknd_enroutedcfc"),
names_to = c("day_type","location_and_charger_type"),
names_sep = "_")

but I encounter this error:

Error:

! java.lang.NullPointerException: Cannot invoke "String.endsWith(String)" because "name" is null

Run sparklyr::spark_last_error() to see the full Spark error (multiple lines)

To use the previous style of error message set options("sparklyr.simple.errors" = TRUE)

---

Backtrace:

1. ├─... %>% ...

2. ├─tidyr::pivot_longer(...)

3. ├─sparklyr:::pivot_longer.tbl_spark(...)

4. │ └─sparklyr:::sdf_pivot_longer(...)

5. │ └─.postprocess_pivot_longer_output(data, group_vars, spec, values, ...

6. │ ├─... %@% lapply(group_vars, as.symbol)

7. │ │ └─sparklyr (local) fn(largs)

8. │ │ ├─base::do.call(fn, append(list(x), as.list(largs)))

9. │ │ └─base::append(list(x), as.list(largs))

10. │ ├─... %@% lapply(output_cols, as.symbol)

11. │ │ └─sparklyr (local) fn(largs)

12. │ │ ├─base::do.call(fn, append(list(x), as.list(largs)))

13. │ │ └─base::append(list(x), as.list(largs))

14. │ └─out %>% invoke("sort", id_col, as.list(key_cols)) %>% ...

15. ├─sparklyr::sdf_register(.)

16. ├─sparklyr::invoke(., "sort", id_col, as.list(key_cols))

17. └─sparklyr:::invoke.shell_jobj(., "sort", id_col, as.list(key_cols))

18. ├─sparklyr::invoke_method(...)

19. └─sparklyr:::invoke_method.spark_shell_connection(...)

20. └─sparklyr:::core_invoke_method(...)

21. └─sparklyr:::core_invoke_method_impl(...)

22. └─sparklyr:::spark_error(msg)

23. └─rlang::abort(message = msg, use_cli_format = TRUE, call = NULL)

Looking at the source code, it looks like it could have something to do with:

key_cols <- colnames(spec[-(1:2)])

Something wrong with key_cols or id_col which assumes length(names_to) = 1 ? (just a guess)

Here is my workaround, which worked.
pivot_longer(
cols = wkday_home1:wkday_enroutedcfc,
names_to = "day_type_and_location_and_charger_type") %>%
separate(day_type_and_location_and_charger_type, into = c("day_type", "location_and_charger_type"))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant