Skip to content
This repository has been archived by the owner on May 17, 2024. It is now read-only.

Python API diff_tables() throws duckdb.BinderException #887

Closed
cmcnicoll opened this issue Apr 22, 2024 · 1 comment
Closed

Python API diff_tables() throws duckdb.BinderException #887

cmcnicoll opened this issue Apr 22, 2024 · 1 comment
Labels
bug Something isn't working triage

Comments

@cmcnicoll
Copy link

Describe the bug
Getting an error when trying the following example using DuckDB:

table1 = connect_to_table('postgresql:///', 'Rating', 'id')
list(diff_tables(table1, table1))
[]

Code:

import duckdb  # 0.10.2
from data_diff import connect_to_table, diff_tables  # 0.11.1

with duckdb.connect("test.duckdb") as con:
    con.sql("drop table if exists test_table")
    con.sql(
        "create table test_table as select * from read_csv('test/*.csv', header = true)"
    )
    con.sql("show all tables").show()
    con.table("test_table").show()

test_table = connect_to_table(
    "duckdb://test.duckdb", "test.main.test_table", ("test_id")
)
print(test_table, "\n")

list(diff_tables(test_table, test_table))

Output:

$ py test_data_diff.py 
┌──────────┬─────────┬────────────┬───────────────────────┬───────────────────┬───────────┐
│ database │ schema  │    name    │     column_names      │   column_types    │ temporary │
│ varchar  │ varchar │  varchar   │       varchar[]       │     varchar[]     │  boolean  │
├──────────┼─────────┼────────────┼───────────────────────┼───────────────────┼───────────┤
│ test     │ main    │ test_table │ [test_id, test_value] │ [BIGINT, VARCHAR] │ false     │
└──────────┴─────────┴────────────┴───────────────────────┴───────────────────┴───────────┘

┌─────────┬────────────┐
│ test_id │ test_value │
│  int64  │  varchar   │
├─────────┼────────────┤
│       1 │ a          │
│       2 │ b          │
│       3 │ c          │
└─────────┴────────────┘

TableSegment(database=DuckDB(default_schema='main', _interactive=False, is_closed=False,
_dialect=Dialect(_prevent_overflow_when_concat=False), _args={'filepath': ''},
_conn=<duckdb.duckdb.DuckDBPyConnection object at 0x000001AD984E6C30>), table_path=('test', 'main', 'test_table'),
key_columns=('test_id',), update_column=None, extra_columns=(), ignored_columns=frozenset(), min_key=None, max_key=None,
min_update=None, max_update=None, where=None, case_sensitive=True, _schema=None)

Traceback (most recent call last):
  File "C:\code\duckdb\data-diff\test_data_diff.py", line 17, in <module>
    list(diff_tables(test_table, test_table))
  File "C:\code\duckdb\.venv\Lib\site-packages\data_diff\diff_tables.py", line 95, in __iter__
    for i in self.diff:
  File "C:\code\duckdb\.venv\Lib\site-packages\data_diff\diff_tables.py", line 266, in _diff_tables_wrapper
    raise error
  File "C:\code\duckdb\.venv\Lib\site-packages\data_diff\diff_tables.py", line 236, in _diff_tables_wrapper
    table1, table2 = self._threaded_call("with_schema", [table1, table2])
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\code\duckdb\.venv\Lib\site-packages\data_diff\diff_tables.py", line 51, in _threaded_call
    return list(self._thread_map(methodcaller(func), iterable))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\cmcni\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures\_base.py", line 619, in result_iterator
    yield _result_or_cancel(fs.pop())
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\cmcni\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures\_base.py", line 317, in _result_or_cancel
    return fut.result(timeout)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\cmcni\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures\_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\cmcni\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures\_base.py", line 401, in __get_result
    raise self._exception
  File "C:\Users\cmcni\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures\thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\code\duckdb\.venv\Lib\site-packages\data_diff\table_segment.py", line 153, in with_schema
    return self._with_raw_schema(self.database.query_table_schema(self.table_path))
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\code\duckdb\.venv\Lib\site-packages\data_diff\databases\base.py", line 1048, in query_table_schema
    rows = self.query(self.select_table_schema(path), list, log_message=path)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\code\duckdb\.venv\Lib\site-packages\data_diff\databases\base.py", line 996, in query
    res = self._query(sql_code)
          ^^^^^^^^^^^^^^^^^^^^^
  File "C:\code\duckdb\.venv\Lib\site-packages\data_diff\databases\duckdb.py", line 141, in _query
    return self._query_conn(self._conn, sql_code)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\code\duckdb\.venv\Lib\site-packages\data_diff\databases\base.py", line 1188, in _query_conn
    return apply_query(callback, sql_code)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\code\duckdb\.venv\Lib\site-packages\data_diff\databases\base.py", line 211, in apply_query
    return callback(sql_code)
           ^^^^^^^^^^^^^^^^^^
  File "C:\code\duckdb\.venv\Lib\site-packages\data_diff\databases\base.py", line 1173, in _query_cursor
    c.execute(sql_code)
duckdb.duckdb.BinderException: Binder Error: Catalog "test" does not exist!

Describe the environment
Windows 11 Pro
Python 3.12.2
data_diff 0.11.1
duckdb 0.10.2

@cmcnicoll cmcnicoll added the bug Something isn't working label Apr 22, 2024
@glebmezh
Copy link
Contributor

Hi @cmcnicoll,

Thank you for trying out data-diff and for taking the time to open this issue. We made a hard decision to sunset the data-diff package and won't provide further development or support. Diffing functionality will continue to be available in Datafold Cloud. However, DuckDB connector is not yet supported in the cloud (on the roadmap).

Feel free to contact us at support@datafold.com if you have any questions.

-Gleb

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

2 participants