Skip to content

Latest commit

 

History

History

data_collector

Data Collector

Introduction

Scripts for data collection

Custom Data Collection

Specific implementation reference: https://github.com/microsoft/qlib/tree/main/scripts/data_collector/yahoo

  1. Create a dataset code directory in the current directory
  2. Add collector.py
    • add collector class:
      CUR_DIR = Path(__file__).resolve().parent
      sys.path.append(str(CUR_DIR.parent.parent))
      from data_collector.base import BaseCollector, BaseNormalize, BaseRun
      class UserCollector(BaseCollector):
          ...
    • add normalize class:
      class UserNormalzie(BaseNormalize):
          ...
    • add CLI class:
      class Run(BaseRun):
          ...
  3. add README.md
  4. add requirements.txt

Description of dataset

Basic data
Features Price/Volume:
   - $close/$open/$low/$high/$volume/$change/$factor
Calendar <freq>.txt:
   - day.txt
   - 1min.txt
Instruments <market>.txt:
   - required: all.txt;
   - csi300.txt/csi500.txt/sp500.txt
  • Features: data, digital
    • if not adjusted, factor=1

Data-dependent component

To make the component running correctly, the dependent data are required

Component required data
Data retrieval Features, Calendar, Instrument
Backtest Features[Price/Volume], Calendar, Instruments