You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
New to use Qlib but I did look up my questions online and asked LLM, no solutions so far.
Here are what I am facing:
I have 1min level trading data in more than 10 csv files, each file is over 500MB. All the csv files follow same format,
[instrument, time, open, high, low, close, volume, turnover, is_paused].
In this case column 'instrument' saves asset code, so one file will have tons of stock code.
Column 'time' saves trading time stamp, e.g. '1/2/2019 9:53:00 AM'.
Problems:
1, All the csv files are in one folder, I tried run 'python dump_bin.py dump_all --csv_path 'csv file folder path' --qlib_dir 'target file path' --symbol_field_name instrument --date_field_name time --include_fields open,high,low,close,volume,turnover,is_paused'.
then the system returned 'concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.'
Is this because of short of memory? (file size too large? )
because I tried to put only one csv file in the folder then the 'python dump_bin.py' worked, partially.
After I 'successfully' ran 'python dump_bin.py', I checked the qlib data dir. there are 3 folder, calendar, features, and instruments.
However in folder instruments, I only see an 'all.txt' file, and it has only one row, the csv file name, start date and end date.
There is a 'day.txt' in calendar folder, but it only save date level data, e.g. '2019-01-02', there is no minute.
Appreciated if anyone could share your advice!
The text was updated successfully, but these errors were encountered:
I think your csv file needs some preprocessing before it can be converted to a bin file, with the following caveats.
One thing to keep in mind is to categorize the data by stock code and name the file after the stock code. e.g. SH600000.csv
The time column needs to be converted from 12 hours to 24 hours. e.g. 2010-12-01 14:34:00
When dump_bin you need to use --date_field_name to specify the time column, --symbol_field_name to specify the stock code column, use --exclude_fields to exclude the stock code column and the time column, because qlib will store them in its own way.
Hi there,
New to use Qlib but I did look up my questions online and asked LLM, no solutions so far.
Here are what I am facing:
I have 1min level trading data in more than 10 csv files, each file is over 500MB. All the csv files follow same format,
[instrument, time, open, high, low, close, volume, turnover, is_paused].
In this case column 'instrument' saves asset code, so one file will have tons of stock code.
Column 'time' saves trading time stamp, e.g. '1/2/2019 9:53:00 AM'.
Problems:
1, All the csv files are in one folder, I tried run 'python dump_bin.py dump_all --csv_path 'csv file folder path' --qlib_dir 'target file path' --symbol_field_name instrument --date_field_name time --include_fields open,high,low,close,volume,turnover,is_paused'.
then the system returned 'concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.'
Is this because of short of memory? (file size too large? )
because I tried to put only one csv file in the folder then the 'python dump_bin.py' worked, partially.
However in folder instruments, I only see an 'all.txt' file, and it has only one row, the csv file name, start date and end date.
There is a 'day.txt' in calendar folder, but it only save date level data, e.g. '2019-01-02', there is no minute.
Appreciated if anyone could share your advice!
The text was updated successfully, but these errors were encountered: