Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spark_apply() error, "Can't query fields" #3416

Open
brianstamper opened this issue Jan 23, 2024 · 0 comments
Open

spark_apply() error, "Can't query fields" #3416

brianstamper opened this issue Jan 23, 2024 · 0 comments

Comments

@brianstamper
Copy link

Attempting to use spark_apply but getting an error. This is on a Cloudera Machine Learning environment, with the same error appearing in either Spark 2.4.7 or 3.2.1. On another machine with a local Spark install I do not see this issue, but the error I'm getting here is not giving much indication what the problem is.

For a small demo I'll use the first spark_apply() example from sparklyr - Distributing R Computations

library(tidyverse)
library(sparklyr)

conf <- spark_config()
sc <- spark_connect(master = 'yarn-client', config = conf)

sdf_len(sc, 5, repartition = 1) %>%
  spark_apply(function(e) I(e))

And the error I get looks like the following. It does make me wonder if this is actually a dbplyr issue instead.

Error in `db_query_fields.DBIConnection()`:
! Can't query fields.
Caused by error in `value[[3L]]()`:
! Failed to fetch data: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 588.0 failed 4 times, most recent failure: Lost task 0.3 in stage 588.0 (TID 57382) (10.42.5.51 executor 112): org.apache.spark.SparkException: Process List(tar, -xf, packages.51962.tar) exited with code 2
Backtrace:
  1. sdf_len(sc, 5, repartition = 1) %>% ...
  4. sparklyr::sdf_len(sc, 5, repartition = 1)
  5. sparklyr::sdf_seq(sc, 1, length, repartition = repartition, type = type)
  7. sparklyr:::sdf_register.spark_jobj(sdf)
  9. sparklyr:::tbl.spark_connection(sc, name)
 10. sparklyr:::spark_tbl_sql(src = src, from)
 11. dbplyr::tbl_sql(...)
 13. dbplyr:::dbplyr_query_fields(src$con, from)
 14. dbplyr:::dbplyr_fallback(con, "db_query_fields", ...)
 16. dbplyr:::db_query_fields.DBIConnection(con, ...)

Cross posting from https://community.rstudio.com/t/using-spark-apply-throws-a-sparkexception-process-list-tar-xf-packages-75547-tar-exited-with-code-2/180555

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant