[Freeze][Resume After Silver + Streamlit] Fix #769 - OHLCV with CSV Data Store #858

kdetry · 2024-04-04T13:42:58Z

Fixes #769

Changes proposed in this PR:

OHLCV data writing logic is removed
OHLCV is integrated with CSV Data Store
OHCLV tests are updated to use CSV Data Store
all commands (psutil.py as an example) that are not being used any longer (used only by tests) have been deprecated

idiom-bytes · 2024-04-12T00:19:47Z

There are a bunch of functions from plutil.py that are not being used anymore. These should be either re-wired to use CDS and prove that CSV-writing works in the exact same way.

These commands are only being used in tests

initialize_rawohlcv_df,
load_rawohlcv_file,
save_rawohlcv_file,

The following are still being used elsewhere (in common code)

concat_next_df,
has_data,
newest_ut,
oldest_ut,

…oad_append, and other tests are being validated with the new CSVDataStore

…o CSVS when appropriate

pdr_backend/lake/csv_data_store.py

trentmc · 2024-04-16T11:48:25Z

pdr_backend/lake/csv_data_store.py


            # read the last record from the file
            last_file = pl.read_csv(file_path)
-            return int(last_file["timestamp"][-1])
+            return UnixTimeMs(int(last_file["timestamp"][-1]))


(I'm putting a comment here, for lack of a better place.)

The implementation of _get_from_value() and _get_to_value() currently depends on the name of the file.

This is too brittle, and will be hard to maintain.

Better: auto-detect the values by reading the file itself. It's not that expensive, and will be far better to work with. This is what we've done in the past; and we can re-use that past code. Details:

A baseline version: simply load the whole file, and read the first entry and last entry.

Even faster: open the file but don't load it; then read either the second line (for "from" value); or rapidly iterate to the last line and read it (for "to" value).

The latter has already been built for you! It's how we used to do things, before porting to polars.

It's in the file pdutil.py, circa Nov 23, 2023

Function oldest_ut() finds the "from" value

Function newest_ut() finds the "to" value

They both rely on lower-level helper functions. Nice and clean:)

And it's all unit-tested too. See test_pdutil.py from that time

The function load_csv() may be useful too

We create a folder for each dataset and populate them with files named according to the timestamp values of their data. Each file contains 1000 rows of data.

test_data_folder/x_from_0000000001_to_00000212112.csv
test_data_folder/x_from_00000212112_to_00004912112.csv
test_data_folder/x_from_00004912112_to_00408912312.csv

A baseline version: simply load the whole file, and read the first entry and last entry.

so actually _get_from_value and _get_to_value methods do not read the same file, they detect the last file and return the "from" or "to" value

pdr_backend/lake/csv_data_store.py

idiom-bytes · 2024-05-06T22:16:36Z

@kdetry have you
(1) read trents feedback?
(2) have written a ticket or updated this in the duckdb-integration code?

Can you please complete (1) and (2) and make sure to then close this PR.

kdetry · 2024-05-07T14:56:56Z

I added followings "DoD"s to the main task. I close this PR for now, we can handle it after the shipping.

Change the IO methods to a function, in a new module cvsutil.py
[z] use them into the CSVDataStore class and make it cleaner and thinner

issue-769 - CSV Integration with OHLCV

81a8ca8

kdetry changed the base branch from main to issue685-duckdb-integration April 4, 2024 13:43

kdetry added 2 commits April 4, 2024 16:44

unused imports are removed

c1cdc0e

unnecessary things and fixes

7c3b9ef

idiom-bytes added 8 commits April 11, 2024 18:26

started to re-write test_plutils such that the original load_basic, l…

de26da2

…oad_append, and other tests are being validated with the new CSVDataStore

Added todo so I can resume from where I left off

8b93a2f

Refactored test_load_basic

494d877

Fixing test_load_append

4194a9e

Updating tests such that they are working with CSVDataWriter

89bf097

Fixing tests such that they are passing. Still need to fix foo.csv

365bc0d

Removing ref to save_raw_ohlcv

9c9328b

removing todos, reviewed the output and foo.csv is going into the tmpdir

e49ed46

idiom-bytes marked this pull request as draft April 12, 2024 14:11

Continuing to rewrite plutils to use CSVDataWriter. Will move tests t…

c16f62a

…o CSVS when appropriate

idiom-bytes changed the title ~~Fix #769 - OHLCV with CSV Data Store~~ [Freeze][Do Not Merge] Fix #769 - OHLCV with CSV Data Store Apr 12, 2024

idiom-bytes changed the title ~~[Freeze][Do Not Merge] Fix #769 - OHLCV with CSV Data Store~~ [Freeze][Resume After DuckDB] Fix #769 - OHLCV with CSV Data Store Apr 12, 2024

idiom-bytes changed the title ~~[Freeze][Resume After DuckDB] Fix #769 - OHLCV with CSV Data Store~~ [Freeze][Resume After Silver + Streamlit] Fix #769 - OHLCV with CSV Data Store Apr 12, 2024

trentmc requested changes Apr 16, 2024

View reviewed changes

kdetry closed this May 7, 2024

idiom-bytes mentioned this pull request May 10, 2024

[Lake][ETL] DuckDB E2E - Ingestion -> Dashboards #685

Open

47 tasks

trentmc mentioned this pull request May 13, 2024

[Lake][OHCLV] Update OHCLV data_factory to use csv_data_writer #769

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Freeze][Resume After Silver + Streamlit] Fix #769 - OHLCV with CSV Data Store #858

[Freeze][Resume After Silver + Streamlit] Fix #769 - OHLCV with CSV Data Store #858

kdetry commented Apr 4, 2024 •

edited by idiom-bytes

idiom-bytes commented Apr 12, 2024

trentmc Apr 16, 2024

kdetry May 7, 2024 •

edited

idiom-bytes commented May 6, 2024 •

edited

kdetry commented May 7, 2024 •

edited by idiom-bytes

[Freeze][Resume After Silver + Streamlit] Fix #769 - OHLCV with CSV Data Store #858

[Freeze][Resume After Silver + Streamlit] Fix #769 - OHLCV with CSV Data Store #858

Conversation

kdetry commented Apr 4, 2024 • edited by idiom-bytes

idiom-bytes commented Apr 12, 2024

trentmc Apr 16, 2024

Choose a reason for hiding this comment

kdetry May 7, 2024 • edited

Choose a reason for hiding this comment

idiom-bytes commented May 6, 2024 • edited

kdetry commented May 7, 2024 • edited by idiom-bytes

kdetry commented Apr 4, 2024 •

edited by idiom-bytes

kdetry May 7, 2024 •

edited

idiom-bytes commented May 6, 2024 •

edited

kdetry commented May 7, 2024 •

edited by idiom-bytes