Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG insert dataset depends on INTERNAL ds order. #18

Open
awb99 opened this issue Feb 17, 2024 · 4 comments
Open

BUG insert dataset depends on INTERNAL ds order. #18

awb99 opened this issue Feb 17, 2024 · 4 comments

Comments

@awb99
Copy link

awb99 commented Feb 17, 2024

I found a weird bug when inserting a dataset via (duckdb/insert-dataset! (:conn session) ds)
Sometimes the insert fails, even though all columns are the same and all data-types are the same
as defined via (duckdb/create-table! (:conn session) ds)

From printing the initial dataset that was used to create the table, I have deducted the order in
which tml-ds has arranged the columns internally. I replicate this via (order-columns-strange ds).
When I want call insert-dataset! via a differently arranged dataset, for example: (order-columns ds),
then the insert throws an exception.

This is a weird behavior which I have not have expected.

This is the code I used to get the inserts to work / not-work:

(defn order-columns [ds]
  (tc/dataset [(:date ds)
               (:asset ds)
               (:open ds)
               (:high ds)
               (:low ds)
               (:close ds)
               (:volume ds)
               (:epoch ds)
               (:ticks ds)]))

(defn order-columns-strange [ds]
  (tc/dataset [(:open ds)
               (:epoch ds)
               (:date ds)
               (:close ds)               
               (:volume ds)
               (:high ds)
               (:low ds)
               (:ticks ds)
               (:asset ds)
               ]))
@cnuernber
Copy link
Collaborator

cnuernber commented Feb 17, 2024

Yes I haven't verified this but the duckdb insert operation should be independent of the dataset column order. Seems odd.

What exactly is the error?

@harold
Copy link
Contributor

harold commented Feb 17, 2024

Confirmed, maybe this can lead to a test case:

user> (require '[tech.v3.dataset :as ds])
nil
user> (require '[tmducken.duckdb :as duckdb])
nil
user> (duckdb/initialize! {:duckdb-home "binaries"})
Feb 17, 2024 12:01:37 PM clojure.tools.logging$eval13663$fn__13666 invoke
INFO: Attempting to load duckdb from "binaries/libduckdb.so"
true
user> (def db (duckdb/open-db))
#'user/db
user> (def conn (duckdb/connect db))
#'user/conn
user> (def ds (ds/select-columns (ds/->dataset {:a [1 2 3] :b ["x" "y" "z"]}) [:a :b]))
#'user/ds
user> ds
_unnamed [3 2]:

| :a | :b |
|---:|----|
|  1 |  x |
|  2 |  y |
|  3 |  z |
user> (duckdb/create-table! conn ds)
"_unnamed"
user> (duckdb/insert-dataset! conn ds)
3
user> (duckdb/sql->dataset conn "select * from _unnamed")
Feb 17, 2024 12:03:50 PM clojure.tools.logging$eval13663$fn__13666 invoke
INFO: Reference thread starting
:_unnamed [3 2]:

| a | b |
|--:|---|
| 1 | x |
| 2 | y |
| 3 | z |
user> (def db (duckdb/open-db)) ;; FRESH NEW DB
#'user/db
user> (def conn (duckdb/connect db))
#'user/conn
user> (duckdb/create-table! conn ds)
"_unnamed"
user> (def swizzled (ds/select-columns ds [:b :a]))
#'user/swizzled
user> swizzled
_unnamed [3 2]:

| :b | :a |
|----|---:|
|  x |  1 |
|  y |  2 |
|  z |  3 |
user> (duckdb/insert-dataset! conn swizzled)
Execution error at tmducken.duckdb/insert-dataset!$fn$check-error (duckdb.clj:321).
{:address 0x00007FF29101D730 }

@cnuernber
Copy link
Collaborator

cnuernber commented Feb 18, 2024

The work around for this is to use a prepared statement - this may take a moment to fix as it doesn't appear that duckdb exposes tableinfo from the connection to the C api at this time.

@awb99
Copy link
Author

awb99 commented Feb 22, 2024

This is a namespace that can be used to reproduce the bug I posted:

(ns notebook.playground.bardb.bug-duckdb
   (:require
   [tick.core :as t]
   [tablecloth.api :as tc]
   [tmducken.duckdb :as duckdb]))
 
 (def db (duckdb/open-db "/tmp/duckdb-test"))

 (def conn (duckdb/connect db))


(defn make-ds [v]
  (let [table-name "test"]
    (-> (tc/dataset v)
        (tc/set-dataset-name table-name))))
 
(def empty-ds (make-ds [{:date (t/now)
                         :open 0.0 :high 0.0 :low 0.0 :close 0.0
                         :volume 0.0 ; crypto volume is double.
                         :ticks 0
                         :asset "000"}]))
 
 empty-ds


(duckdb/create-table! conn empty-ds)
 
(def data-ds (make-ds [{:date (t/now)
                        :open 1.0 :high 2.0 :low 3.0 :close 4.0
                        :volume 0.0 ; crypto volume is double.
                        :ticks 0
                        :asset "1"}]))

(duckdb/insert-dataset! conn data-ds)
;; => 1

(def data2-ds (make-ds [{:asset "2"; NOTE asset is the first key
                        :date (t/now)
                        :open 1.0 :high 2.0 :low 3.0 :close 4.0
                        :volume 0.0 ; crypto volume is double.
                        :ticks 0
                        }]))
 
 (duckdb/insert-dataset! conn data2-ds)
 ;; => Execution error at tmducken.duckdb/insert-dataset!$fn$check-error (duckdb.clj:320).
 ;;    {:address 0x00007F53A4467DB0 }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants