Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CANNOT_RESOLVE_STAR_EXPAND error sdf_sql() / DBR 15.2 #3439

Closed
romangehrn opened this issue May 6, 2024 · 5 comments
Closed

CANNOT_RESOLVE_STAR_EXPAND error sdf_sql() / DBR 15.2 #3439

romangehrn opened this issue May 6, 2024 · 5 comments
Assignees
Labels
bug databricks Issues related to Databricks connection mode

Comments

@romangehrn
Copy link

With Databricks Runtime DBR 15.2 an error occurs when I use the function sdf_sql() to drop a table. DBR 15.2 normally has sparkylr 1.8.4, I changed it to 1.8.6, but the error is still there.

sc <- sparklyr::spark_connect(method="databricks")

# write table
sdf_test <- sparklyr::sdf_copy_to(sc, mtcars, "test_mtcars", overwrite = T)
sparklyr::spark_write_table(sdf_test, "ch_lab.ran.test_mtcars")

# delet table
sparklyr::sdf_sql(sc, "DROP TABLE ch_lab.ran.test_mtcars")
Error : org.apache.spark.sql.AnalysisException: [CANNOT_RESOLVE_STAR_EXPAND]
Cannot resolve `sparklyr_tmp_b788591d_ca36_490c_a969_264b49bdddde`.* given
input columns . Please check that the specified table or struct exists and is
accessible in the input columns. SQLSTATE: 42704; line 1 pos 7
Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.2 (2023-10-31)
 os       Ubuntu 22.04.4 LTS
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  C.UTF-8
 ctype    C.UTF-8
 tz       Etc/UTC
 date     2024-05-06
 pandoc   NAPackages ───────────────────────────────────────────────────────────────────
 package     * version date (UTC) lib source
 askpass       1.2.0   2023-09-03 [4] RSPM (R 4.3.2)
 cachem        1.0.8   2023-05-01 [4] RSPM (R 4.3.2)
 cli           3.6.2   2023-12-11 [4] RSPM (R 4.3.2)
 config        0.3.2   2023-08-30 [4] RSPM (R 4.3.2)
 DBI           1.2.1   2024-01-12 [4] RSPM (R 4.3.2)
 dbplyr        2.5.0   2024-03-19 [1] RSPM (R 4.3.0)
 devtools      2.4.5   2022-10-11 [4] RSPM (R 4.3.2)
 digest        0.6.34  2024-01-11 [4] RSPM (R 4.3.2)
 dplyr         1.1.4   2023-11-17 [4] RSPM (R 4.3.2)
 ellipsis      0.3.2   2021-04-29 [4] RSPM (R 4.3.2)
 fansi         1.0.6   2023-12-08 [4] RSPM (R 4.3.2)
 fastmap       1.1.1   2023-02-24 [4] RSPM (R 4.3.2)
 fs            1.6.3   2023-07-20 [4] RSPM (R 4.3.2)
 generics      0.1.3   2022-07-05 [4] RSPM (R 4.3.2)
 glue          1.7.0   2024-01-09 [4] RSPM (R 4.3.2)
 htmltools     0.5.7   2023-11-03 [4] RSPM (R 4.3.2)
 htmlwidgets   1.6.4   2023-12-06 [4] RSPM (R 4.3.2)
 httpuv        1.6.14  2024-01-26 [4] RSPM (R 4.3.2)
 httr          1.4.7   2023-08-15 [4] RSPM (R 4.3.2)
 jsonlite      1.8.8   2023-12-04 [4] RSPM (R 4.3.2)
 later         1.3.2   2023-12-06 [4] RSPM (R 4.3.2)
 lifecycle     1.0.4   2023-11-07 [4] RSPM (R 4.3.2)
 magrittr      2.0.3   2022-03-30 [4] RSPM (R 4.3.2)
 memoise       2.0.1   2021-11-26 [4] RSPM (R 4.3.2)
 mime          0.12    2021-09-28 [4] RSPM (R 4.3.2)
 miniUI        0.1.1.1 2018-05-18 [4] RSPM (R 4.3.2)
 openssl       2.1.1   2023-09-25 [4] RSPM (R 4.3.2)
 pillar        1.9.0   2023-03-22 [4] RSPM (R 4.3.2)
 pkgbuild      1.4.3   2023-12-10 [4] RSPM (R 4.3.2)
 pkgconfig     2.0.3   2019-09-22 [4] RSPM (R 4.3.2)
 pkgload       1.3.4   2024-01-16 [4] RSPM (R 4.3.2)
 profvis       0.3.8   2023-05-02 [4] RSPM (R 4.3.2)
 promises      1.2.1   2023-08-10 [4] RSPM (R 4.3.2)
 purrr         1.0.2   2023-08-10 [4] RSPM (R 4.3.2)
 R6            2.5.1   2021-08-19 [4] RSPM (R 4.3.2)
 Rcpp          1.0.12  2024-01-09 [4] RSPM (R 4.3.2)
 remotes       2.4.2.1 2023-07-18 [4] RSPM (R 4.3.2)
 rlang         1.1.3   2024-01-10 [4] RSPM (R 4.3.2)
 Rserve        1.8-13  2023-11-28 [4] RSPM (R 4.3.2)
 rstudioapi    0.15.0  2023-07-07 [4] RSPM (R 4.3.2)
 sessioninfo   1.2.2   2021-12-06 [4] RSPM (R 4.3.2)
 shiny         1.8.0   2023-11-17 [4] RSPM (R 4.3.2)
 sparklyr    * 1.8.6   2024-04-29 [1] RSPM (R 4.3.0)
 SparkR        3.5.0   2024-05-06 [2] local
 stringi       1.8.3   2023-12-11 [4] RSPM (R 4.3.2)
 stringr       1.5.1   2023-11-14 [4] RSPM (R 4.3.2)
 tibble        3.2.1   2023-03-20 [4] RSPM (R 4.3.2)
 tidyr         1.3.1   2024-01-24 [4] RSPM (R 4.3.2)
 tidyselect    1.2.1   2024-03-11 [1] RSPM (R 4.3.0)
 urlchecker    1.0.1   2021-11-30 [4] RSPM (R 4.3.2)
 usethis       2.2.2   2023-07-06 [4] RSPM (R 4.3.2)
 utf8          1.2.4   2023-10-22 [4] RSPM (R 4.3.2)
 uuid          1.2-0   2024-01-14 [4] RSPM (R 4.3.2)
 vctrs         0.6.5   2023-12-01 [4] RSPM (R 4.3.2)
 withr         3.0.0   2024-01-16 [4] RSPM (R 4.3.2)
 xtable        1.8-4   2019-04-21 [4] RSPM (R 4.3.2)
 yaml          2.3.8   2023-12-11 [4] RSPM (R 4.3.2)
@edgararuiz edgararuiz added the databricks Issues related to Databricks connection mode label May 6, 2024
@edgararuiz
Copy link
Collaborator

HI @romangehrn , thank you for reporting it. This came up in DB v2, so I fixed it in pysparklyr, but wasn't sure if DB v1 was affected until now. I'll work on a fix for this. Bottom line, it seems that now sdf_register() (which runs as part of the sdf_sql()) does not cache the empty table. This should only break when you run "operational queries", which won't return data

@edgararuiz edgararuiz self-assigned this May 6, 2024
@edgararuiz edgararuiz added the bug label May 6, 2024
@romangehrn
Copy link
Author

Hi @edgararuiz exactly sdf_sql() with a select statement in it, works fine. Thanks for fixing it! Would be great, if we would have the bugfix until DBR 15.3

@edgararuiz
Copy link
Collaborator

It looks like the issue goes beyond Databricks connections. I'm able to recreate by deleting a table, which is an operation that returns no data.

Somewhat minimum reprex:

library(sparklyr)
sc <- spark_connect("local", version = "3.5")
sdf_copy_to(sc, mtcars, "mtcars")
sdf_sql(sc, "drop table mtcars")
#> Error:
#> ! org.apache.spark.sql.AnalysisException: [CANNOT_RESOLVE_STAR_EXPAND]
#>   Cannot resolve `sparklyr_tmp_68a85dc3_1ce8_4921_84c8_9c8131649976`.* given
#>   input columns . Please check that the specified table or struct exists and is
#>   accessible in the input columns.; line 1 pos 7
#> 
#> Run �]8;;x-r-run:sparklyr::spark_last_error()�`sparklyr::spark_last_error()`�]8;;� to see the full Spark error (multiple lines)
#> To use the previous style of error message set
#> `options("sparklyr.simple.errors" = TRUE)`
spark_disconnect(sc)

Created on 2024-05-07 with reprex v2.1.0

edgararuiz added a commit that referenced this issue May 8, 2024
@edgararuiz
Copy link
Collaborator

Hi @romangehrn , I just merged a fix, which automatically closed the Issue. But no problem, if you are still having trouble, please feel free to reopen

@romangehrn
Copy link
Author

@edgararuiz thanks, great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug databricks Issues related to Databricks connection mode
Projects
None yet
Development

No branches or pull requests

2 participants