You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have various Adapters for tabular data formats. The existing Adapters for parquet-formatted and CSV-formatted data have different advantages. Parquet is a binary format with a rich data type system and built-in compression. CSV is a human-readable format that supports appending rows (which parquet does not).
We propose to add support for Arrow-formatted data (sometimes also called "Feather") which, like CSV is append-able, but like Parquet is binary and has a rich data type system. Like the CSV Adapter, this should implement write_partition and append_partition, with the same signatures:
What these functions don't need is deal with partitioning (a.k.a. chunking) of rows. To implement read_partition and write_partition and append_partition we'll need to dig deeper into the Arrow IPC (inter-process communication) API to write "row batches" to an existing table, rather than write/read an entire table in a single call.
We have various Adapters for tabular data formats. The existing Adapters for parquet-formatted and CSV-formatted data have different advantages. Parquet is a binary format with a rich data type system and built-in compression. CSV is a human-readable format that supports appending rows (which parquet does not).
We propose to add support for Arrow-formatted data (sometimes also called "Feather") which, like CSV is append-able, but like Parquet is binary and has a rich data type system. Like the CSV Adapter, this should implement
write_partition
andappend_partition
, with the same signatures:tiled/tiled/adapters/csv.py
Lines 153 to 185 in 307524d
We have functions in the codebase for reading and writing Arrow-formatted data because the server and the client transmit data as Arrow by default.
These are the functions:
tiled/tiled/serialization/table.py
Lines 9 to 24 in 307524d
What these functions don't need is deal with partitioning (a.k.a. chunking) of rows. To implement
read_partition
andwrite_partition
andappend_partition
we'll need to dig deeper into the Arrow IPC (inter-process communication) API to write "row batches" to an existing table, rather than write/read an entire table in a single call.https://arrow.apache.org/docs/python/ipc.html
The text was updated successfully, but these errors were encountered: