Cannot write dataframe to Databricks Unity Catalog table #3397

Zurina · 2023-11-01T14:00:37Z

Hi,

I am unsuccessful in writing to a table in Databricks Unity Catalog. I can easily read data from catalogs/schemas. I am using Python Databricks connect. I receive the same result regardless of using Azure Token or PAT Token. For example, this code:

library(sparklyr)
library(pysparklyr)
library(dplyr)
library(dbplyr)

sc <- spark_connect(
  master = "<my_db_workspace_url", 
  cluster_id = "<cluster_id>",
  token = "Azure_Token/PAT_Token",
  method = "databricks_connect"
)

my_table <- tbl(sc, in_catalog("main", "default", "my_table"))

The above works well. But I seem to be unable to write data. I have tried the following:

sparklyr::copy_to(sc, my_table, in_catalog("main", "default", "my_table2"))

I receive:

> sparklyr::copy_to(sc, my_table, in_catalog("main", "default", "my_table2"))
Error in py_call_impl(callable, call_args$unnamed, call_args$named) : 
  TypeError: bad argument type for built-in operation

── Python Exception Message ────────────────────────────────────────────────────────────────────────
Traceback (most recent call last):
  File "/home/ubuntu/.virtualenvs/r-sparklyr-databricks-13.3/lib/python3.10/site-packages/pyspark/sql/connect/catalog.py", line 216, in tableExists
    pdf = self._execute_and_fetch(plan.TableExists(table_name=tableName, db_name=dbName))
  File "/home/ubuntu/.virtualenvs/r-sparklyr-databricks-13.3/lib/python3.10/site-packages/pyspark/sql/connect/catalog.py", line 49, in _execute_and_fetch
    pdf = DataFrame.withPlan(catalog, session=self._sparkSession).toPandas()
  File "/home/ubuntu/.virtualenvs/r-sparklyr-databricks-13.3/lib/python3.10/site-packages/pyspark/sql/connect/dataframe.py", line 1654, in toPandas
    query = self._plan.to_proto(self._session.client)
  File "/home/ubuntu/.virtualenvs/r-sparklyr-databricks-13.3/lib/python3.10/site-packages/pyspark/sql/connect/plan.py", line 118, in to_proto
    plan.root.CopyFrom(self.plan(session))
  File "/home/ubuntu/.virtualenvs/r-sparklyr-databricks-13.3/lib/python3.10/site-packages/pyspark/sql/connect/plan.py", line 1818, in plan
    plan.catalog.table_exists.table_name = self._table_name
TypeError: bad argument type for built-in operation

── R Traceback ─────────────────────────────────────────────────────────────────────────────────────
    ▆
 1. ├─sparklyr::copy_to(...)
 2. └─sparklyr:::copy_to.spark_connection(...)
 3.   ├─sparklyr::sdf_copy_to(...)
 4.   └─pysparklyr:::sdf_copy_to.pyspark_connection(...)
 5.     └─context$catalog$tableExists(name)
 6.       └─reticulate:::py_call_impl(callable, call_args$unnamed, call_args$named)

Using:

Python 3.10
Sparklyr 1.8.4
Databricks runtime 13.3 LTS (includes Apache Spark 3.4.1, Scala 2.12)
Databricks Connect 14.1.0

Any ideas to how I can write to a specific table in Unity Catalog with the path format catalog.schema.table?

The text was updated successfully, but these errors were encountered:

Zurina · 2023-11-02T12:14:38Z

https://stackoverflow.com/questions/77398619/how-can-you-write-to-a-databricks-catalog-from-r

Zurina · 2023-11-06T14:24:26Z

@edgararuiz, do you happen to know anything regarding this? :)

edgararuiz · 2023-11-06T20:48:23Z

Hi, copy_to() focuses on saving temporary tables. Their location is determined by Spark Connect, so all you should have to pass is copy_to(sc, my_table). Are you trying to create a permanent table?

Zurina · 2023-11-07T07:46:34Z

@edgararuiz thanks for the repsonse. Yes, I want to create a permanent table in Unity Catalog. Do you know which method I should use for that? I have not been able to locate the correct one myself. I need to be able to specify which catalog and schema the table should be created in, like using the in_catalog('catalog', 'schema', 'table') function from the dbplyr package.

Zurina · 2023-11-28T13:43:22Z

@edgararuiz sorry for pinging you again. But do you have any updates/ideas?

cocinerox · 2023-12-11T21:27:42Z

Hi @Zurina & @edgararuiz
IMO the following should work, but it fails:

> spark_write_table(my_table, "main.default.my_table2")
Error in py_get_attr_impl(x, name, silent) : 
  AttributeError: 'DataFrameWriter' object has no attribute '%>%'
Run `reticulate::py_last_error()` for details.

A workaround might be:

my_table_pydf <- sparklyr:::spark_sqlresult_from_dplyr(my_table)$pyspark_obj
reticulate::py_run_string(
  "r.my_table_pydf.write.format('delta').mode('error').saveAsTable('main.default.my_table2')")

(Here mode can be: error, append, overwrite or ignore.)

Zurina · 2023-12-26T21:53:22Z

@cocinerox, thanks for your input. I agree, that part should work. Your workaround definitely works, but I hope this will be possible to do in native R eventually :)

cocinerox · 2023-12-27T17:24:20Z

@Zurina, a "native" R solution:

my_table_pydf <- sparklyr:::spark_sqlresult_from_dplyr(my_table)$pyspark_obj
my_table_pydf |>
  sparklyr::invoke("write") |>
  invoke_obj("format", "delta") |>
  invoke_obj("mode", "error") |>
  sparklyr::invoke("saveAsTable", "main.default.my_table2")

where

invoke_obj <- function(...) {
  sparklyr::invoke(...)$pyspark_obj
}

edgararuiz · 2024-01-17T13:41:13Z

Morning, the latest version of pysparklyr now supports spark_write_table(), which itself calls saveAsTable. I think that will encapsulate the solution above

cocinerox · 2024-01-29T11:21:59Z

@edgararuiz It works for me. Thanks!

Zurina changed the title ~~Cannot write table to Unity Catalog schema~~ Cannot write dataframe to Unity Catalog table Nov 1, 2023

Zurina changed the title ~~Cannot write dataframe to Unity Catalog table~~ Cannot write dataframe to Databricks Unity Catalog table Nov 1, 2023

edgararuiz added the databricks Issues related to Databricks connection mode label Nov 6, 2023

edgararuiz added the awaiting response label Jan 17, 2024

github-actions bot removed the awaiting response label Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot write dataframe to Databricks Unity Catalog table #3397

Cannot write dataframe to Databricks Unity Catalog table #3397

Zurina commented Nov 1, 2023 •

edited

Zurina commented Nov 2, 2023

Zurina commented Nov 6, 2023

edgararuiz commented Nov 6, 2023

Zurina commented Nov 7, 2023 •

edited

Zurina commented Nov 28, 2023

cocinerox commented Dec 11, 2023

Zurina commented Dec 26, 2023

cocinerox commented Dec 27, 2023

edgararuiz commented Jan 17, 2024

cocinerox commented Jan 29, 2024

Cannot write dataframe to Databricks Unity Catalog table #3397

Cannot write dataframe to Databricks Unity Catalog table #3397

Comments

Zurina commented Nov 1, 2023 • edited

Zurina commented Nov 2, 2023

Zurina commented Nov 6, 2023

edgararuiz commented Nov 6, 2023

Zurina commented Nov 7, 2023 • edited

Zurina commented Nov 28, 2023

cocinerox commented Dec 11, 2023

Zurina commented Dec 26, 2023

cocinerox commented Dec 27, 2023

edgararuiz commented Jan 17, 2024

cocinerox commented Jan 29, 2024

Zurina commented Nov 1, 2023 •

edited

Zurina commented Nov 7, 2023 •

edited