Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Posit Workbench keeps crashing when connecting to Spark in local mode #3374

Open
tweakyTweeter opened this issue Aug 30, 2023 · 5 comments
Open

Comments

@tweakyTweeter
Copy link

Posit Workbench keeps crashing when trying to connect to Spark in local mode via sparklyr package. Expected output is to be able to connect to a spark instance. When I try to run spark_connect using method = "test" option, I get an error with respect to as_tibble function as shown below. I tried downgrading various packages such as sparklyr, tibble, dplyr etc. but nothing seems to work. Would really appreciate if anyone has any suggestions to diagnose this issue as I'm drawing a blank and couldn't find any suggestions on Stakoverflow.

library(sparklyr)
#> 
#> Attaching package: 'sparklyr'
#> The following object is masked from 'package:stats':
#> 
#>     filter
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 3.6.3 (2020-02-29)
#>  os       Ubuntu 18.04.6 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Etc/GMT
#>  date     2023-08-30
#>  pandoc   2.19.2 @ /usr/lib/rstudio-server/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.3)
#>  base64enc     0.1-3   2015-07-28 [1] CRAN (R 3.6.3)
#>  cachem        1.0.8   2023-05-01 [1] CRAN (R 3.6.3)
#>  callr         3.7.3   2022-11-02 [1] CRAN (R 3.6.3)
#>  cli           3.6.1   2023-03-23 [1] CRAN (R 3.6.3)
#>  crayon        1.5.2   2022-09-29 [1] CRAN (R 3.6.3)
#>  DBI           1.1.3   2022-06-18 [1] CRAN (R 3.6.3)
#>  dbplyr        2.2.1   2022-06-27 [1] CRAN (R 3.6.3)
#>  devtools      2.4.5   2022-10-11 [1] CRAN (R 3.6.3)
#>  digest        0.6.33  2023-07-07 [1] CRAN (R 3.6.3)
#>  dplyr         1.1.2   2023-04-20 [1] CRAN (R 3.6.3)
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 3.6.3)
#>  evaluate      0.21    2023-05-05 [1] CRAN (R 3.6.3)
#>  fansi         1.0.4   2023-01-22 [1] CRAN (R 3.6.3)
#>  fastmap       1.1.1   2023-02-24 [1] CRAN (R 3.6.3)
#>  fs            1.6.3   2023-07-20 [1] CRAN (R 3.6.3)
#>  generics      0.1.3   2022-07-05 [1] CRAN (R 3.6.3)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 3.6.3)
#>  htmltools     0.5.6   2023-08-10 [1] CRAN (R 3.6.3)
#>  htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 3.6.3)
#>  httpuv        1.6.11  2023-05-11 [1] CRAN (R 3.6.3)
#>  httr          1.4.7   2023-08-15 [1] CRAN (R 3.6.3)
#>  jsonlite      1.8.4   2022-12-06 [1] CRAN (R 3.6.3)
#>  knitr         1.43    2023-05-25 [1] CRAN (R 3.6.3)
#>  later         1.3.1   2023-05-02 [1] CRAN (R 3.6.3)
#>  lifecycle     1.0.3   2022-10-07 [1] CRAN (R 3.6.3)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 3.6.3)
#>  memoise       2.0.1   2021-11-26 [1] CRAN (R 3.6.3)
#>  mime          0.12    2021-09-28 [1] CRAN (R 3.6.3)
#>  miniUI        0.1.1.1 2018-05-18 [1] CRAN (R 3.6.3)
#>  pillar        1.9.0   2023-03-22 [1] CRAN (R 3.6.3)
#>  pkgbuild      1.4.2   2023-06-26 [1] CRAN (R 3.6.3)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 3.6.3)
#>  pkgload       1.3.2.1 2023-07-08 [1] CRAN (R 3.6.3)
#>  prettyunits   1.1.1   2020-01-24 [1] CRAN (R 3.6.3)
#>  processx      3.8.2   2023-06-30 [1] CRAN (R 3.6.3)
#>  profvis       0.3.8   2023-05-02 [1] CRAN (R 3.6.3)
#>  promises      1.2.1   2023-08-10 [1] CRAN (R 3.6.3)
#>  ps            1.7.5   2023-04-18 [1] CRAN (R 3.6.3)
#>  purrr         1.0.2   2023-08-10 [1] CRAN (R 3.6.3)
#>  R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 3.6.3)
#>  R.oo          1.25.0  2022-06-12 [1] CRAN (R 3.6.3)
#>  R.utils       2.12.2  2022-11-11 [1] CRAN (R 3.6.3)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 3.6.3)
#>  Rcpp          1.0.11  2023-07-06 [1] CRAN (R 3.6.3)
#>  remotes       2.4.2.1 2023-07-18 [1] CRAN (R 3.6.3)
#>  reprex        2.0.2   2022-08-17 [1] CRAN (R 3.6.3)
#>  rlang         1.1.1   2023-04-28 [1] CRAN (R 3.6.3)
#>  rmarkdown     2.24    2023-08-14 [1] CRAN (R 3.6.3)
#>  rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 3.6.3)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 3.6.3)
#>  shiny         1.7.5   2023-08-12 [1] CRAN (R 3.6.3)
#>  sparklyr    * 1.8.2   2023-07-01 [1] CRAN (R 3.6.3)
#>  stringi       1.7.12  2023-01-11 [1] CRAN (R 3.6.3)
#>  stringr       1.5.0   2022-12-02 [1] CRAN (R 3.6.3)
#>  tibble        3.2.1   2023-03-20 [1] CRAN (R 3.6.3)
#>  tidyr         1.2.1   2022-09-08 [1] CRAN (R 3.6.3)
#>  tidyselect    1.2.0   2022-10-10 [1] CRAN (R 3.6.3)
#>  urlchecker    1.0.1   2021-11-30 [1] CRAN (R 3.6.3)
#>  usethis       2.2.2   2023-07-06 [1] CRAN (R 3.6.3)
#>  utf8          1.2.3   2023-01-31 [1] CRAN (R 3.6.3)
#>  vctrs         0.6.3   2023-06-14 [1] CRAN (R 3.6.3)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 3.6.3)
#>  xfun          0.40    2023-08-09 [1] CRAN (R 3.6.3)
#>  xtable        1.8-4   2019-04-21 [1] CRAN (R 3.6.3)
#>  yaml          2.3.7   2023-01-23 [1] CRAN (R 3.6.3)
#> 
#>  [1] /usr/local/lib/remote_cran_repo/r_shared_libraries/R3.6
#>  [2] /usr/local/lib/h2o/h2o-3.14.0.6
#>  [3] /usr/local/lib/h2o/h2o-3.16.0.2
#>  [4] /usr/local/lib/h2o/h2o-3.20.0.2
#>  [5] /usr/local/lib/R/3.6.3/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
sc <- sparklyr::spark_connect(master = "local", method = "test")
rlang::last_trace(drop = FALSE) 
#> <error/tibble_error_column_scalar_type>
#> Error in `as_tibble()`:
#> ! All columns in a tibble must be vectors.
#> ✖ Column `list(master = "local[40]", config = list(spark.env.SPARK_LOCAL_IP.local = "127.0.0.1", sparklyr.connect.csv.embedded = "^1.*", spark.sql.legacy.utcTimestampFunc.enabled = TRUE,
#>   sparklyr.connect.cores.local = 40, spark.sql.shuffle.partitions.local = 40, sparklyr.shell.name = "sparklyr", \`sparklyr.shell.driver-memory\` = "2g"), state = <environment>)` is a
#>   `spark_connection/test_connection/DBIConnection` object.
#> ---
#> Backtrace:
#>      ▆
#>   1. └─.rs.connectionListObjects("Spark", "local - ")
#>   2.   └─connection$listObjects(...)
#>   3.     └─sparklyr:::connection_list_tables(scon, includeType = TRUE)
#>   4.       ├─base::sort(dbListTables(sc))
#>   5.       ├─DBI::dbListTables(sc)
#>   6.       └─sparklyr (local) dbListTables(sc)
#>   7.         └─sparklyr (local) .local(conn, ...)
#>   8.           └─sparklyr:::df_from_sql(conn, query)
#>   9.             └─sparklyr:::df_from_sdf(sc, sdf)
#>  10.               └─sparklyr::sdf_collect(sdf)
#>  11.                 └─sparklyr:::sdf_collect_static(object, impl, ...)
#>  12.                   └─sparklyr:::sdf_collect_data_frame(sdf, collected)
#>  13.                     ├─tibble::as_tibble(fixed, stringsAsFactors = FALSE, optional = TRUE)
#>  14.                     └─tibble:::as_tibble.list(fixed, stringsAsFactors = FALSE, optional = TRUE)
#>  15.                       └─tibble:::lst_to_tibble(x, .rows, .name_repair, col_lengths(x))
#>  16.                         └─tibble:::check_valid_cols(x, call = call)
#>  17.                           └─tibble:::abort_column_scalar_type(...)
#>  18.                             └─tibble:::tibble_abort(...)
#>  19.                               └─rlang::abort(x, class, ..., call = call, parent = parent, use_cli_format = TRUE)
#> 
@edgararuiz
Copy link
Collaborator

Hi, what is the reason to use method = "test" in your use-case? Wouldn't simply using spark_connect("local") be sufficient?

@tweakyTweeter
Copy link
Author

If I use spark_connect("local") RStudio instantly crashes and no error logs are generated for me to debug the issue. So I was trying with method = "test"

@edgararuiz
Copy link
Collaborator

Ok, what kind of error message is Workbench displaying?

@tweakyTweeter
Copy link
Author

It just crashes the session without any error messages. Let me try with sparkly.log.console option and check if I can get any error messages.

@tweakyTweeter
Copy link
Author

Even with the options(sparklyr.log.console = TRUE) command, the R session instantly crashes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants