Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the bug of reading string NA as NaN in the function exists_qlib_data. #1736

Merged
merged 5 commits into from May 10, 2024

Conversation

OzzyXu
Copy link
Contributor

@OzzyXu OzzyXu commented Jan 20, 2024

Fix the bug of reading string NA as NaN of exists_qlib_data in /qlib/utils/__init__.py.

Description

Nano Labs Ltd is a new Nasdaq-listing company with the ticker name NA from August 1, 2022. The default na_value list of pd.read_csv includes "NA". Changed the default reading behavior of pd.read_csv in exists_qlib_data by adding keep_default_na=False. Removed two values ("NA" and "NULL") from the default NA list while reading the first column of "all.txt", which normally are all strings.

Motivation and Context

To fix the bug in #1720.

How Has This Been Tested?

  • Pass the test by running: pytest qlib/tests/test_all_pipeline.py under upper directory of qlib.
  • If you are adding a new feature, test on your own test scripts.

Screenshots of Test Results (if appropriate):

  1. Pipeline test:
    image

  2. Your own tests:
    image (1)
    Place the attached file all.txt under \qlib_data\us_data_made\instruments and test with the following code

import qlib
import pandas as pd
import sys, site
from pathlib import Path
from qlib.utils import exists_qlib_data
from qlib.constant import REG_US

scripts_dir = Path.cwd().parent.joinpath("scripts")
provider_uri = "./qlib_data/us_data_made"  # target_dir
if not exists_qlib_data(provider_uri):
    print(f"Qlib data is not found in {provider_uri}")
    sys.path.append(str(scripts_dir))
    from get_data import GetData

    GetData().qlib_data(target_dir=provider_uri, region=REG_US)
qlib.init(provider_uri=provider_uri, region=REG_US)

Types of changes

  • Fix bugs
  • Add new feature
  • Update documentation

@github-actions github-actions bot added the waiting for triage Cannot auto-triage, wait for triage. label Jan 20, 2024
@OzzyXu
Copy link
Contributor Author

OzzyXu commented Jan 20, 2024

@microsoft-github-policy-service agree

@OzzyXu OzzyXu marked this pull request as ready for review January 20, 2024 08:33
@OzzyXu OzzyXu changed the title Fix the bug of reading NA string as NaN in exists_qlib_data. Fix the bug of reading string NA as NaN in exists_qlib_data. Jan 20, 2024
@OzzyXu OzzyXu changed the title Fix the bug of reading string NA as NaN in exists_qlib_data. Fix the bug of reading string NA as NaN in the function exists_qlib_data. Jan 20, 2024
@OzzyXu
Copy link
Contributor Author

OzzyXu commented Feb 12, 2024

@SunsetWolf Hey, can I ask why all tests from sources other than slow failed? Do I need to take care of this? Thank you.

@SunsetWolf SunsetWolf merged commit b1e0e77 into microsoft:main May 10, 2024
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting for triage Cannot auto-triage, wait for triage.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants