dplyr to Spark SQL Translation Error: Unresolved Column When Selecting All Columns #3431

milieere · 2024-03-26T17:11:05Z

Hello,

I am trying to filter table data from Databricks connection using dplyr. I am encountering an issue if I skip the 'select' statement - it seems that if you skip the selection of the columns, when translating the statement to SQL, instead of SELECT * it creates SELECT <table_name>.*, which leads to throwing Spark SQL error:

Error in py_call_impl(callable, call_args$unnamed, call_args$named) : 
  pyspark.errors.exceptions.connect.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter with name `table`.`*` cannot be resolved. Did you mean one of the following? [...]. SQLSTATE: 42703; line 1 pos 7;
'Project ['table.*]

What works - when I specify column names to select

con <- sparklyr::spark_connect(
  cluster_id = "...",
  method = "databricks_connect",
  version = "14.3"
)

table <- dplyr::tbl(con, dbplyr::in_catalog("catalog", "db", "table"))

filtered <- table %>%
  dplyr::select(col1, col2) %>%
  dplyr::filter(col1 == 'some_val') %>%
  dplyr::collect()

print(filtered)

What doesn't work - when I want to select all and do not specify column names to select

This code leads to above shared error.

filtered <- table %>%
  dplyr::filter(col1 == 'some_val') %>%
  dplyr::collect()

print(filtered)

Thanks in advance!
S.

The text was updated successfully, but these errors were encountered:

edgararuiz · 2024-04-05T13:26:06Z

Hi, this should have been resolved in the latest version of sparkly, which was published 3/25 (#3430). Would you mind confirming that you have sparklyr 1.8.5? And if you don't, can you confirm that the issue goes away after you upgrade to 1.8.5? Thank you

RPanczak · 2024-04-08T13:41:07Z

Thank you @edgararuiz - we might have been caught with outdated version - all works now with 1.8.5.

milieere · 2024-04-09T09:03:01Z

Thank you @edgararuiz , we confirmed with @RPanczak that with new version all works as expected!

edgararuiz · 2024-05-06T21:08:57Z

Thank you @milieere , I'll go ahead and close this issue

edgararuiz added the databricks Issues related to Databricks connection mode label Apr 5, 2024

edgararuiz added the awaiting response label Apr 5, 2024

github-actions bot removed the awaiting response label Apr 8, 2024

edgararuiz closed this as completed May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dplyr to Spark SQL Translation Error: Unresolved Column When Selecting All Columns #3431

dplyr to Spark SQL Translation Error: Unresolved Column When Selecting All Columns #3431

milieere commented Mar 26, 2024

edgararuiz commented Apr 5, 2024

RPanczak commented Apr 8, 2024

milieere commented Apr 9, 2024

edgararuiz commented May 6, 2024

dplyr to Spark SQL Translation Error: Unresolved Column When Selecting All Columns #3431

dplyr to Spark SQL Translation Error: Unresolved Column When Selecting All Columns #3431

Comments

milieere commented Mar 26, 2024

What works - when I specify column names to select

What doesn't work - when I want to select all and do not specify column names to select

edgararuiz commented Apr 5, 2024

RPanczak commented Apr 8, 2024

milieere commented Apr 9, 2024

edgararuiz commented May 6, 2024