-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Freeze][Resume After Silver + Streamlit] Fix #769 - OHLCV with CSV Data Store #858
[Freeze][Resume After Silver + Streamlit] Fix #769 - OHLCV with CSV Data Store #858
Conversation
There are a bunch of functions from plutil.py that are not being used anymore. These should be either re-wired to use CDS and prove that CSV-writing works in the exact same way. These commands are only being used in tests
The following are still being used elsewhere (in common code)
|
…oad_append, and other tests are being validated with the new CSVDataStore
…o CSVS when appropriate
|
||
# read the last record from the file | ||
last_file = pl.read_csv(file_path) | ||
return int(last_file["timestamp"][-1]) | ||
return UnixTimeMs(int(last_file["timestamp"][-1])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'm putting a comment here, for lack of a better place.)
The implementation of _get_from_value()
and _get_to_value()
currently depends on the name of the file.
This is too brittle, and will be hard to maintain.
Better: auto-detect the values by reading the file itself. It's not that expensive, and will be far better to work with. This is what we've done in the past; and we can re-use that past code. Details:
- A baseline version: simply load the whole file, and read the first entry and last entry.
- Even faster: open the file but don't load it; then read either the second line (for "from" value); or rapidly iterate to the last line and read it (for "to" value).
- The latter has already been built for you! It's how we used to do things, before porting to polars.
- It's in the file pdutil.py, circa Nov 23, 2023
- Function
oldest_ut()
finds the "from" value - Function
newest_ut()
finds the "to" value - They both rely on lower-level helper functions. Nice and clean:)
- And it's all unit-tested too. See test_pdutil.py from that time
- The function
load_csv()
may be useful too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We create a folder for each dataset and populate them with files named according to the timestamp values of their data. Each file contains 1000 rows of data.
test_data_folder/x_from_0000000001_to_00000212112.csv
test_data_folder/x_from_00000212112_to_00004912112.csv
test_data_folder/x_from_00004912112_to_00408912312.csv
A baseline version: simply load the whole file, and read the first entry and last entry.
so actually _get_from_value
and _get_to_value
methods do not read the same file, they detect the last file and return the "from" or "to" value
@kdetry have you Can you please complete (1) and (2) and make sure to then close this PR. |
I added followings "DoD"s to the main task. I close this PR for now, we can handle it after the shipping.
|
Fixes #769
Changes proposed in this PR: