You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Binary formats like msgpack (messagepack can be concatenated like in a .jsonl) could be read by table scans using python generators, however, this requires pyarrow to be installed.
import duckdb
import pyarrow as pa
# create a table with key and content columns
con = duckdb.connect()#'result-snappy1.db'
# create a RecordBatchReader with 2 columns: key and content that generates 300 rows of random data on the fly
schema = pa.schema([('key', pa.int64()), ('content', pa.binary())])
def iter_record_batches():
for i in range(1000):
yield pa.RecordBatch.from_arrays([pa.array([i]*10), pa.array([b'hello']*10)], schema=schema)
generator1 = pa.RecordBatchReader.from_batches(schema, iter_record_batches())
con.execute("SELECT * FROM generator1 WHERE key==1;")
# show results
print(con.fetchall())
would be nice to have something RecordBatchReader integrated into duckdb, so that people don't need to install pyarrow and use pyarrow's format.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Binary formats like msgpack (messagepack can be concatenated like in a .jsonl) could be read by table scans using python generators, however, this requires pyarrow to be installed.
would be nice to have something RecordBatchReader integrated into duckdb, so that people don't need to install pyarrow and use pyarrow's format.
Beta Was this translation helpful? Give feedback.
All reactions