Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dplyr to Spark SQL Translation Error: Unresolved Column When Selecting All Columns #3431

Closed
milieere opened this issue Mar 26, 2024 · 4 comments
Labels
databricks Issues related to Databricks connection mode

Comments

@milieere
Copy link

Hello,

I am trying to filter table data from Databricks connection using dplyr. I am encountering an issue if I skip the 'select' statement - it seems that if you skip the selection of the columns, when translating the statement to SQL, instead of SELECT * it creates SELECT <table_name>.*, which leads to throwing Spark SQL error:

Error in py_call_impl(callable, call_args$unnamed, call_args$named) : 
  pyspark.errors.exceptions.connect.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter with name `table`.`*` cannot be resolved. Did you mean one of the following? [...]. SQLSTATE: 42703; line 1 pos 7;
'Project ['table.*]

What works - when I specify column names to select

con <- sparklyr::spark_connect(
  cluster_id = "...",
  method = "databricks_connect",
  version = "14.3"
)

table <- dplyr::tbl(con, dbplyr::in_catalog("catalog", "db", "table"))

filtered <- table %>%
  dplyr::select(col1, col2) %>%
  dplyr::filter(col1 == 'some_val') %>%
  dplyr::collect()

print(filtered)

What doesn't work - when I want to select all and do not specify column names to select

This code leads to above shared error.

filtered <- table %>%
  dplyr::filter(col1 == 'some_val') %>%
  dplyr::collect()

print(filtered)

Thanks in advance!
S.

@edgararuiz edgararuiz added the databricks Issues related to Databricks connection mode label Apr 5, 2024
@edgararuiz
Copy link
Collaborator

Hi, this should have been resolved in the latest version of sparkly, which was published 3/25 (#3430). Would you mind confirming that you have sparklyr 1.8.5? And if you don't, can you confirm that the issue goes away after you upgrade to 1.8.5? Thank you

@RPanczak
Copy link

RPanczak commented Apr 8, 2024

Thank you @edgararuiz - we might have been caught with outdated version - all works now with 1.8.5.

@milieere
Copy link
Author

milieere commented Apr 9, 2024

Thank you @edgararuiz , we confirmed with @RPanczak that with new version all works as expected!

@edgararuiz
Copy link
Collaborator

Thank you @milieere , I'll go ahead and close this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
databricks Issues related to Databricks connection mode
Projects
None yet
Development

No branches or pull requests

3 participants