Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lake][OHCLV] Update OHCLV data_factory to use csv_data_writer #769

Open
6 tasks done
idiom-bytes opened this issue Mar 11, 2024 · 3 comments
Open
6 tasks done

[Lake][OHCLV] Update OHCLV data_factory to use csv_data_writer #769

idiom-bytes opened this issue Mar 11, 2024 · 3 comments
Assignees
Labels
Type: Enhancement New feature or request

Comments

@idiom-bytes
Copy link
Member

idiom-bytes commented Mar 11, 2024

Motivation

OHLCV is still writing/reading from parquet.

We may instead want to use the CSVDataStore + PersistentDataStore to save/read from. But doing this right now is hard, OHLCV code is tightly-bound w/ saving/loading/etc.

DoD

  • OHCLV Data Factory CSV/Saving/Querying code has been abstracted away and is using CSVDataStore + PersistentDataStore
  • OHCLV Data Factory and DataStore logic have been separated
  • OHCLV Data Factory is querying from PersistentDataStore
  • OHCLV Data Factory tests are up-to-date and every test is passing

### Update According to Trent's comment

  • Change the IO methods to a function, in a new module cvsutil.py
  • use them into the CSVDataStore class and make it cleaner and thinner
@idiom-bytes
Copy link
Member Author

CSV Data Store had already been integrated into duckdb-integration PR although @trentmc has provided feedback on the PR.

Trent: I'm not sure if we're able to currently merge w/ these updates before implementing feedback or not. Please let us know so we can organize this update.

@trentmc
Copy link
Member

trentmc commented May 13, 2024

Trent: I'm not sure if we're able to currently merge w/ these updates before implementing feedback or not. Please let us know so we can organize this update.

For reference, I had two comments in the PR. I just checked. On the second comment, Mustafa gave a good response and addressed my concerns. So I "resolved" that comment in the PR.

On the first comment, it's a small thing. Easier to simply follow the suggestion now, than try to track later.

For convenience, here it is👇


Both csv and duckDB are persistent data stores.

This file is csv_data_store.py, good.

The module for duckDB should be renamed from persistent_data_store.py to duckdb_data_store.py. (Otherwise its scope overlaps with that of csv_data_store.py)

@idiom-bytes
Copy link
Member Author

Thank you for the response @trentmc, i'm starting to look at these peripheral duckdb items to sign off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Enhancement New feature or request
Projects
None yet
3 participants