Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small fix for local storage #1556

Open
wants to merge 38 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
223252c
check dir exist for storage
Jun 16, 2023
60edbc4
index should be int
Jun 16, 2023
d607d81
remove to_csv twice
Jun 18, 2023
5579b8a
index should be int
Jun 18, 2023
c0b60e9
should not get_recent_freq if folder is empty
Jun 18, 2023
07083ae
fix index missing bug
Jun 19, 2023
655e666
improve logging
Jun 27, 2023
0a0d7dc
allow None model and dataset in SoftTopkStrategy
Jul 7, 2023
a0d1450
use line width 120
PaleNeutron Aug 17, 2023
70b5c9f
change get_data url (#1558)
SunsetWolf Jun 25, 2023
58f73de
Update release-drafter.yml (#1569)
you-n-g Jun 25, 2023
ba2df87
Update __init__.py
you-n-g Jun 25, 2023
1e9140d
Update __init__.py
you-n-g Jun 27, 2023
194ac59
Update README.md for RL (#1573)
you-n-g Jun 28, 2023
a656648
fix_pip_ci (#1584)
SunsetWolf Jul 5, 2023
706138c
fix download token (#1577)
m3ngyang Jul 6, 2023
fab4e0a
Update qlibrl docs. (#1588)
lwwang1995 Jul 7, 2023
9a0291f
Postpone PR stale. (#1591)
you-n-g Jul 12, 2023
d9936c4
Adjust rolling api (#1594)
you-n-g Jul 14, 2023
6cefe4a
Fixed pyqlib version issue on macos (#1605)
SunsetWolf Jul 18, 2023
a65fca8
Update __init__.py
you-n-g Jul 18, 2023
9e990e5
Bump Version & Fix CI (#1606)
you-n-g Jul 18, 2023
e5df276
fix_ci (#1608)
SunsetWolf Jul 19, 2023
ee50f7c
Update introduction.rst (#1579)
computerscienceiscool Jul 26, 2023
9864038
Update README.md (#1553)
GeneLiuXe Jul 26, 2023
2d0162d
Update introduction.rst (#1578)
computerscienceiscool Jul 26, 2023
e2019f8
depress warning with pandas option_context (#1524)
Fivele-Li Aug 1, 2023
42ba746
fix docs (#1618)
SunsetWolf Aug 2, 2023
b624ddf
Add multi pass portfolio analysis record (#1546)
chenditc Aug 4, 2023
e9fbb4f
Add exploration noise to rl training collector (#1481)
chenditc Aug 18, 2023
10e27d5
Troubleshooting pip version issues in CI (#1504)
Fivele-Li Aug 24, 2023
b300af7
suppress the SettingWithCopyWarning of pandas (#1513)
Fivele-Li Sep 1, 2023
8e446aa
Update requirements.txt (#1521)
kimzhuan Sep 15, 2023
8bcf09e
pred current is confusing
Jul 7, 2023
97c6799
add build system requirements
Jul 19, 2023
265fdc9
add pos and neg operator
Jul 20, 2023
f3ce11a
fix stock is delisted
Jul 20, 2023
68e2640
Merge branch 'main' into main
PaleNeutron Oct 11, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
18 changes: 8 additions & 10 deletions qlib/contrib/strategy/cost_control.py
Expand Up @@ -13,16 +13,11 @@
class SoftTopkStrategy(WeightStrategyBase):
def __init__(
self,
model,
dataset,
topk,
order_generator_cls_or_obj=OrderGenWInteract,
max_sold_weight=1.0,
risk_degree=0.95,
buy_method="first_fill",
trade_exchange=None,
level_infra=None,
common_infra=None,
**kwargs,
):
"""
Expand All @@ -37,7 +32,8 @@ def __init__(
average_fill: assign the weight to the stocks rank high averagely.
"""
super(SoftTopkStrategy, self).__init__(
model, dataset, order_generator_cls_or_obj, trade_exchange, level_infra, common_infra, **kwargs
order_generator_cls_or_obj=order_generator_cls_or_obj,
**kwargs,
)
self.topk = topk
self.max_sold_weight = max_sold_weight
Expand Down Expand Up @@ -89,13 +85,15 @@ def generate_target_weight_position(self, score, current, trade_start_time, trad
max(1 / self.topk - final_stock_weight.get(stock_id, 0), 0.0),
sold_stock_weight,
)
final_stock_weight[stock_id] = final_stock_weight.get(stock_id, 0.0) + add_weight
final_stock_weight[stock_id] = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are parentheses added here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added by my lint tool with line width 80 and should be removed.

final_stock_weight.get(stock_id, 0.0) + add_weight
)
sold_stock_weight -= add_weight
elif self.buy_method == "average_fill":
for stock_id in buy_signal_stocks:
final_stock_weight[stock_id] = final_stock_weight.get(stock_id, 0.0) + sold_stock_weight / len(
buy_signal_stocks
)
final_stock_weight[stock_id] = final_stock_weight.get(
stock_id, 0.0
) + sold_stock_weight / len(buy_signal_stocks)
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes made here did not involve any modifications to the calculation logic, and the new format did not pass the CI testing. Can we revert back to the previous format?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but lint is another problem here and I am confused by conflict between ci and pre-commit, no flake8 in ci and it is not actually used in code. So I can not enable pre-commit in my develop env.

black . -l 120 --check --diff

- id: flake8
args: ["--ignore=E501,F541,E266,E402,W503,E731,E203"]

pre-commit run --all-files
black....................................................................Passed
flake8...................................................................Failed
- hook id: flake8
- exit code: 1

examples/data_demo/data_mem_resuse_demo.py:10:1: F401 'pickle' imported but unused
examples/data_demo/data_mem_resuse_demo.py:12:1: F401 'subprocess' imported but unused
examples/online_srv/online_management_simulate.py:11:1: F401 'qlib.model.trainer.DelayTrainerR' imported but unused
examples/online_srv/online_management_simulate.py:11:1: F401 'qlib.model.trainer.DelayTrainerRM' imported but unused
examples/online_srv/online_management_simulate.py:103:9: F841 local variable 'CSI300_BENCH' is assigned to but never used
tests/rl/test_saoe_simple.py:20:1: F403 'from qlib.rl.order_execution import *' used; unable to detect undefined names
tests/rl/test_saoe_simple.py:51:17: F405 'SingleAssetOrderExecutionSimple' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/rl/test_saoe_simple.py:85:17: F405 'SingleAssetOrderExecutionSimple' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/rl/test_saoe_simple.py:108:21: F405 'SingleAssetOrderExecutionSimple' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/rl/test_saoe_simple.py:111:17: F405 'SingleAssetOrderExecutionSimple' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/rl/test_saoe_simple.py:121:17: F405 'SingleAssetOrderExecutionSimple' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/rl/test_saoe_simple.py:140:17: F405 'SingleAssetOrderExecutionSimple' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/rl/test_saoe_simple.py:148:19: F405 'FullHistoryStateInterpreter' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/rl/test_saoe_simple.py:149:24: F405 'CurrentStepStateInterpreter' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/rl/test_saoe_simple.py:150:26: F405 'CategoricalActionInterpreter' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/rl/test_saoe_simple.py:151:31: F405 'TwapRelativeActionInterpreter' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/rl/test_saoe_simple.py:221:17: F405 'SingleAssetOrderExecutionSimple' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/rl/test_saoe_simple.py:227:19: F405 'FullHistoryStateInterpreter' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/rl/test_saoe_simple.py:228:21: F405 'CategoricalActionInterpreter' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/rl/test_saoe_simple.py:232:15: F405 'Recurrent' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/rl/test_saoe_simple.py:233:14: F405 'PPO' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/rl/test_saoe_simple.py:255:20: F405 'FullHistoryStateInterpreter' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/rl/test_saoe_simple.py:256:21: F405 'TwapRelativeActionInterpreter' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/rl/test_saoe_simple.py:257:14: F405 'AllOne' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/rl/test_saoe_simple.py:261:17: F405 'SingleAssetOrderExecutionSimple' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/rl/test_saoe_simple.py:284:20: F405 'FullHistoryStateInterpreter' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/rl/test_saoe_simple.py:285:21: F405 'CategoricalActionInterpreter' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/rl/test_saoe_simple.py:286:15: F405 'Recurrent' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/rl/test_saoe_simple.py:287:14: F405 'PPO' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/rl/test_saoe_simple.py:292:17: F405 'SingleAssetOrderExecutionSimple' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/rl/test_saoe_simple.py:315:20: F405 'FullHistoryStateInterpreter' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/rl/test_saoe_simple.py:316:21: F405 'CategoricalActionInterpreter' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/rl/test_saoe_simple.py:317:15: F405 'Recurrent' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/rl/test_saoe_simple.py:318:14: F405 'PPO' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/rl/test_saoe_simple.py:321:17: F405 'SingleAssetOrderExecutionSimple' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/rl/test_saoe_simple.py:326:9: F405 'PAPenaltyReward' may be undefined, or defined from star imports: qlib.rl.order_execution
tests/test_contrib_model.py:14:17: F841 local variable 'model' is assigned to but never used
examples/benchmarks/TFT/tft.py:219:9: F841 local variable 'best_loss' is assigned to but never used
examples/benchmarks/TFT/tft.py:258:9: F841 local variable 'use_gpu' is assigned to but never used
examples/benchmarks/TFT/tft.py:273:13: F841 local variable 'targets' is assigned to but never used
tests/rolling_tests/test_update_pred.py:5:1: F401 'fire' imported but unused
tests/rolling_tests/test_update_pred.py:8:1: F401 'qlib' imported but unused
tests/rolling_tests/test_update_pred.py:50:9: F841 local variable 'pred' is assigned to but never used
tests/rolling_tests/test_update_pred.py:113:9: F841 local variable 'pred' is assigned to but never used
scripts/data_collector/cn_index/collector.py:7:1: F401 'datetime' imported but unused
tests/data_mid_layer_tests/test_handler_storage.py:2:1: F401 'time' imported but unused
examples/benchmarks/TFT/libs/utils.py:36:5: E741 ambiguous variable name 'l'
scripts/data_collector/yahoo/collector.py:5:1: F401 're.I' imported but unused
scripts/data_collector/yahoo/collector.py:114:9: F841 local variable 'e' is assigned to but never used
scripts/data_collector/yahoo/collector.py:144:9: F841 local variable 'e' is assigned to but never used
scripts/data_collector/yahoo/collector.py:173:13: F841 local variable 'e' is assigned to but never used
scripts/data_collector/yahoo/collector.py:183:17: F841 local variable 'e' is assigned to but never used
scripts/data_collector/fund/collector.py:92:9: F841 local variable 'e' is assigned to but never used
scripts/data_collector/us_index/collector.py:7:1: F401 'importlib' imported but unused
examples/benchmarks_dynamic/DDG-DA/vis_data.py:2:1: F401 'numpy as np' imported but unused
examples/benchmarks_dynamic/DDG-DA/vis_data.py:10:1: F401 'tqdm.auto.tqdm' imported but unused
examples/benchmarks/TRA/src/model.py:19:1: F401 'qlib.utils.get_or_create_path' imported but unused
examples/orderbook_data/create_dataset.py:9:1: F401 'datetime.date' imported but unused
examples/orderbook_data/create_dataset.py:9:1: F401 'datetime.datetime as dt' imported but unused
examples/orderbook_data/create_dataset.py:12:1: F401 'random' imported but unused
examples/orderbook_data/create_dataset.py:17:1: F401 'arctic.chunkstore' imported but unused
examples/orderbook_data/create_dataset.py:18:1: F401 'arctic' imported but unused
examples/orderbook_data/create_dataset.py:19:1: F811 redefinition of unused 'Arctic' from line 17
examples/orderbook_data/create_dataset.py:20:1: F401 'arctic.chunkstore.chunkstore.CHUNK_SIZE' imported but unused
examples/orderbook_data/create_dataset.py:22:1: F401 'joblib.parallel' imported but unused
examples/orderbook_data/create_dataset.py:23:1: F401 'numpy as np' imported but unused
examples/orderbook_data/create_dataset.py:25:1: F401 'pandas.DataFrame' imported but unused
examples/orderbook_data/create_dataset.py:26:1: F401 'pandas.core.indexes.datetimes.date_range' imported but unused
examples/orderbook_data/create_dataset.py:59:67: F811 redefinition of unused 'date' from line 9
examples/orderbook_data/create_dataset.py:94:45: E262 inline comment should start with '# '
examples/orderbook_data/create_dataset.py:102:64: E262 inline comment should start with '# '
examples/orderbook_data/create_dataset.py:127:21: E712 comparison to True should be 'if cond is True:' or 'if cond:'
examples/orderbook_data/create_dataset.py:136:77: F811 redefinition of unused 'date' from line 9
examples/orderbook_data/create_dataset.py:278:40: F811 redefinition of unused 'date' from line 9
examples/orderbook_data/create_dataset.py:310:72: F811 redefinition of unused 'date' from line 9
scripts/data_collector/br_index/collector.py:6:1: F401 'importlib' imported but unused
examples/highfreq/highfreq_ops.py:1:1: F401 'numpy as np' imported but unused
examples/highfreq/highfreq_ops.py:3:1: F401 'importlib' imported but unused
examples/highfreq/highfreq_ops.py:5:1: F401 'qlib.config.C' imported but unused
examples/highfreq/highfreq_ops.py:6:1: F401 'qlib.data.cache.H' imported but unused
examples/highfreq/highfreq_ops.py:7:1: F401 'qlib.data.data.Cal' imported but unused
examples/highfreq/highfreq_ops.py:149:33: E741 ambiguous variable name 'l'
examples/highfreq/highfreq_ops.py:150:14: E741 ambiguous variable name 'l'
examples/online_srv/rolling_online_management.py:16:1: F401 'qlib.model.trainer.DelayTrainerR' imported but unused
examples/online_srv/rolling_online_management.py:16:1: F401 'qlib.model.trainer.TrainerR' imported but unused
examples/online_srv/rolling_online_management.py:16:1: F401 'qlib.model.trainer.end_task_train' imported but unused
examples/online_srv/rolling_online_management.py:16:1: F401 'qlib.model.trainer.task_train' imported but unused
tests/data_mid_layer_tests/test_dataset.py:6:1: F401 'sys' imported but unused
tests/test_contrib_workflow.py:70:9: F841 local variable 'uri_path' is assigned to but never used
tests/test_contrib_workflow.py:74:9: F841 local variable 'uri_path' is assigned to but never used
tests/test_pit.py:11:1: F401 'baostock as bs' imported but unused
scripts/dump_pit.py:9:1: F401 'abc' imported but unused
scripts/dump_pit.py:12:1: F401 'traceback' imported but unused
scripts/dump_pit.py:14:1: F401 'typing.List' imported but unused
scripts/dump_pit.py:14:1: F401 'typing.Union' imported but unused
scripts/dump_pit.py:16:1: F401 'concurrent.futures.ThreadPoolExecutor' imported but unused
scripts/dump_pit.py:19:1: F401 'numpy as np' imported but unused
scripts/dump_pit.py:23:1: F401 'qlib.utils.code_to_fname' imported but unused
scripts/data_collector/utils.py:85:17: F811 redefinition of unused '_get_calendar' from line 70
scripts/data_collector/utils.py:206:9: F841 local variable '_retry' is assigned to but never used
tests/rl/test_data_queue.py:64:60: F841 local variable 'data_queue' is assigned to but never used
tests/rl/test_data_queue.py:78:66: F841 local variable 'data_queue' is assigned to but never used
tests/data_mid_layer_tests/test_handler.py:3:1: F401 'shutil' imported but unused
examples/workflow_by_code.py:4:39: W291 trailing whitespace
examples/highfreq/workflow.py:13:1: F401 'qlib.data.ops.Operators' imported but unused
examples/highfreq/workflow.py:122:9: E265 block comment should start with '# '
examples/highfreq/workflow.py:127:9: E265 block comment should start with '# '
examples/highfreq/workflow.py:135:9: E265 block comment should start with '# '
examples/highfreq/workflow.py:167:9: E265 block comment should start with '# '
scripts/data_collector/crypto/collector.py:8:1: F401 'requests' imported but unused
scripts/data_collector/crypto/collector.py:41:9: E722 do not use bare 'except'
scripts/data_collector/crypto/collector.py:124:9: F841 local variable 'e' is assigned to but never used
scripts/data_collector/base.py:11:1: F401 'concurrent.futures.ThreadPoolExecutor' imported but unused
scripts/data_collector/base.py:86:13: F841 local variable 'e' is assigned to but never used
tests/backtest/test_high_freq_trading.py:1:1: F401 'typing.List' imported but unused
tests/backtest/test_high_freq_trading.py:1:1: F401 'typing.Tuple' imported but unused
tests/backtest/test_high_freq_trading.py:1:1: F401 'typing.Union' imported but unused
tests/backtest/test_high_freq_trading.py:2:1: F401 'qlib.backtest.position.Position' imported but unused
tests/backtest/test_high_freq_trading.py:4:1: F401 'qlib.backtest.decision.BaseTradeDecision' imported but unused
tests/backtest/test_high_freq_trading.py:5:1: F401 'qlib' imported but unused
tests/backtest/test_high_freq_trading.py:124:9: F841 local variable 'report' is assigned to but never used
tests/backtest/test_high_freq_trading.py:127:9: F841 local variable 'f_dec' is assigned to but never used
tests/test_all_pipeline.py:4:1: F401 'sys' imported but unused
tests/test_all_pipeline.py:10:1: F401 'qlib' imported but unused
tests/test_all_pipeline.py:11:1: F401 'qlib.config.C' imported but unused
examples/benchmarks_dynamic/baseline/rolling_benchmark.py:11:1: F401 'tqdm.auto.tqdm' imported but unused
docs/conf.py:21:1: F401 'os' imported but unused
docs/conf.py:22:1: F401 'sys' imported but unused
examples/nested_decision_execution/workflow.py:105:1: F401 'qlib.data.D' imported but unused
examples/nested_decision_execution/workflow.py:106:1: F401 'qlib.utils.exists_qlib_data' imported but unused
examples/data_demo/data_cache_demo.py:10:1: F401 'pickle' imported but unused
examples/orderbook_data/example.py:4:1: F401 'arctic.arctic.Arctic' imported but unused
examples/rolling_process_data/rolling_handler.py:2:1: F401 'qlib.data.dataset.loader.DataLoaderDH' imported but unused

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I write a Makefile to support the pre-commit check.
Like run 'make precommit' before I commit the code.

Makefile:
.PHONY: precommit
precommit: ## Lint and static-check
black . -l 120
pylint --disable=C0104,C0114,C0115,C0116,C0301,C0302,C0411,C0413,C1802,R0401,R0801,R0902,R0903,R0911,R0912,R0913,R0914,R0915,R1720,W0105,W0123,W0201,W0511,W0613,W1113,W1514,E0401,E1121,C0103,C0209,R0402,R1705,R1710,R1725,R1735,W0102,W0212,W0221,W0223,W0231,W0237,W0612,W0621,W0622,W0703,W1309,E1102,E1136 --const-rgx='[a-z_][a-z0-9_]{2,30}$' qlib --init-hook "import astroid; astroid.context.InferenceContext.max_inferred = 500; import sys; sys.setrecursionlimit(2000)"
flake8 --ignore=E501,F541,E266,E402,W503,E731,E203 --per-file-ignores="init.py:F401,F403" qlib

Uploading Screen Shot 2023-10-17 at 7.19.48 AM.png…

Yes, but lint is another problem here and I am confused by conflict between ci and pre-commit, no flake8 in ci and it is not actually used in code. So I can not enable pre-commit in my develop env.

raise ValueError("Buy method not found")
return final_stock_weight
2 changes: 1 addition & 1 deletion qlib/contrib/strategy/signal_strategy.py
Expand Up @@ -333,7 +333,7 @@ def generate_target_weight_position(self, score, current, trade_start_time, trad

Parameters
-----------
score : pd.Series
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data structure of the score returned by get_signal is defined as Union[pd.Series, pd.DataFrame, None]:

pred_score = self.signal.get_signal(start_time=pred_start_time, end_time=pred_end_time)

Is it appropriate to restrict the score to be only a DataFrame?

score : pd.DataFrame
pred score for this trade date, index is stock_id, contain 'score' column.
current : Position()
current position.
Expand Down
2 changes: 1 addition & 1 deletion qlib/data/pit.py
Expand Up @@ -40,7 +40,7 @@ def _load_internal(self, instrument, start_index, end_index, freq):
s = self._load_feature(instrument, -start_ws, 0, cur_time)
resample_data[cur_index - start_index] = s.iloc[-1] if len(s) > 0 else np.nan
except FileNotFoundError:
get_module_logger("base").warning(f"WARN: period data not found for {str(self)}")
get_module_logger("base").warning(f"WARN: period data not found for {instrument} {str(self)} ({freq})")
return pd.Series(dtype="float32", name=str(self))

resample_series = pd.Series(
Expand Down
10 changes: 6 additions & 4 deletions qlib/data/storage/file_storage.py
Expand Up @@ -80,6 +80,7 @@ def __init__(self, freq: str, future: bool, provider_uri: dict = None, **kwargs)
self._provider_uri = None if provider_uri is None else C.DataPathManager.format_provider_uri(provider_uri)
self.enable_read_cache = True # TODO: make it configurable
self.region = C["region"]
self.uri.parent.mkdir(parents=True, exist_ok=True)

@property
def file_name(self) -> str:
Expand All @@ -90,7 +91,7 @@ def _freq_file(self) -> str:
"""the freq to read from file"""
if not hasattr(self, "_freq_file_cache"):
freq = Freq(self.freq)
if freq not in self.support_freq:
if self.support_freq and freq not in self.support_freq:
# NOTE: uri
# 1. If `uri` does not exist
# - Get the `min_uri` of the closest `freq` under the same "directory" as the `uri`
Expand Down Expand Up @@ -200,6 +201,7 @@ def __init__(self, market: str, freq: str, provider_uri: dict = None, **kwargs):
super(FileInstrumentStorage, self).__init__(market, freq, **kwargs)
self._provider_uri = None if provider_uri is None else C.DataPathManager.format_provider_uri(provider_uri)
self.file_name = f"{market.lower()}.txt"
self.uri.parent.mkdir(parents=True, exist_ok=True)

def _read_instrument(self) -> Dict[InstKT, InstVT]:
if not self.uri.exists():
Expand Down Expand Up @@ -234,7 +236,6 @@ def _write_instrument(self, data: Dict[InstKT, InstVT] = None) -> None:
df.loc[:, [self.SYMBOL_FIELD_NAME, self.INSTRUMENT_START_FIELD, self.INSTRUMENT_END_FIELD]].to_csv(
self.uri, header=False, sep=self.INSTRUMENT_SEP, index=False
)
df.to_csv(self.uri, sep="\t", encoding="utf-8", header=False, index=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this to_csv method.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous line already have to_csv at the end of line.


def clear(self) -> None:
self._write_instrument(data={})
Expand Down Expand Up @@ -289,6 +290,7 @@ def __init__(self, instrument: str, field: str, freq: str, provider_uri: dict =
super(FileFeatureStorage, self).__init__(instrument, field, freq, **kwargs)
self._provider_uri = None if provider_uri is None else C.DataPathManager.format_provider_uri(provider_uri)
self.file_name = f"{instrument.lower()}/{field.lower()}.{freq.lower()}.bin"
self.uri.parent.mkdir(parents=True, exist_ok=True)

def clear(self):
with self.uri.open("wb") as _:
Expand Down Expand Up @@ -320,15 +322,15 @@ def write(self, data_array: Union[List, np.ndarray], index: int = None) -> None:
# rewrite
with self.uri.open("rb+") as fp:
_old_data = np.fromfile(fp, dtype="<f")
_old_index = _old_data[0]
_old_index = int(_old_data[0])
_old_df = pd.DataFrame(
_old_data[1:], index=range(_old_index, _old_index + len(_old_data) - 1), columns=["old"]
)
fp.seek(0)
_new_df = pd.DataFrame(data_array, index=range(index, index + len(data_array)), columns=["new"])
_df = pd.concat([_old_df, _new_df], sort=False, axis=1)
_df = _df.reindex(range(_df.index.min(), _df.index.max() + 1))
_df["new"].fillna(_df["old"]).values.astype("<f").tofile(fp)
np.hstack([_old_index, _df["new"].fillna(_df["old"]).values]).astype("<f").tofile(fp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why fill the missing values in np.hstack([_old_index, _df["new"]) instead of _df["new"]?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first item _old_index is the start index not value. Data structure is [first_index, v0, v1, v2].

But I think current version still have bug when new index is smaller than _old_index


@property
def start_index(self) -> Union[int, None]:
Expand Down