Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qlib to load csv minutes level trading data #1775

Open
dReamix opened this issue Apr 2, 2024 · 1 comment
Open

Qlib to load csv minutes level trading data #1775

dReamix opened this issue Apr 2, 2024 · 1 comment
Labels
question Further information is requested

Comments

@dReamix
Copy link

dReamix commented Apr 2, 2024

Hi there,

New to use Qlib but I did look up my questions online and asked LLM, no solutions so far.

Here are what I am facing:

I have 1min level trading data in more than 10 csv files, each file is over 500MB. All the csv files follow same format,
[instrument, time, open, high, low, close, volume, turnover, is_paused].
In this case column 'instrument' saves asset code, so one file will have tons of stock code.
Column 'time' saves trading time stamp, e.g. '1/2/2019 9:53:00 AM'.

Problems:

1, All the csv files are in one folder, I tried run 'python dump_bin.py dump_all --csv_path 'csv file folder path' --qlib_dir 'target file path' --symbol_field_name instrument --date_field_name time --include_fields open,high,low,close,volume,turnover,is_paused'.

then the system returned 'concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.'

Is this because of short of memory? (file size too large? )

because I tried to put only one csv file in the folder then the 'python dump_bin.py' worked, partially.

  1. After I 'successfully' ran 'python dump_bin.py', I checked the qlib data dir. there are 3 folder, calendar, features, and instruments.
    However in folder instruments, I only see an 'all.txt' file, and it has only one row, the csv file name, start date and end date.

There is a 'day.txt' in calendar folder, but it only save date level data, e.g. '2019-01-02', there is no minute.

Appreciated if anyone could share your advice!

@dReamix dReamix added the question Further information is requested label Apr 2, 2024
@SunsetWolf
Copy link
Collaborator

I think your csv file needs some preprocessing before it can be converted to a bin file, with the following caveats.
One thing to keep in mind is to categorize the data by stock code and name the file after the stock code. e.g. SH600000.csv
The time column needs to be converted from 12 hours to 24 hours. e.g. 2010-12-01 14:34:00
When dump_bin you need to use --date_field_name to specify the time column, --symbol_field_name to specify the stock code column, use --exclude_fields to exclude the stock code column and the time column, because qlib will store them in its own way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants