Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

取得資料和預期的不一致(證交所問題) #106

Open
TienShangHsiu opened this issue Jul 20, 2023 · 2 comments
Open

取得資料和預期的不一致(證交所問題) #106

TienShangHsiu opened this issue Jul 20, 2023 · 2 comments

Comments

@TienShangHsiu
Copy link

感謝作者分享了這個專案,這個回報不是專案的 bug 而是分享證交所潛在的問題。

簡單的來說就是 twse 有不高的機率會出現,你問的資料和回傳的日期不一致,例如我問了代碼 2330 11月份的資料,
結果回傳了其它月份的資料回來,是不是同一個代碼的不確定。

本以為是 twstock 專案的問題,爬了一下 source code 覺得可能性不大,但自己有需求需要近一步處理 exception,所以我自己重寫了一份 fetcher,增添了一些自己的 log 才確認此事。

建議想要抓取的朋友們,可以對收到的 data 日期與自己送出的代碼與 query 日期進行比對,也要檢查 http 回傳 code 是不是 200,做更多檢查與 Exception 的處理,然後出現 excpetion 時,每個 retry 之間最好要有 delay 避免被 twse block 住幾個小時。

@mlouielu
Copy link
Owner

Any chance that you can share your code or with a pull request?

@TienShangHsiu
Copy link
Author

我其實沒有用這個專案,只是參考了一下某些 url 位置然後寫了自己的私有專案。作為回報,以下是部份程式碼,不過設計架構還有邏輯和這個專案是不同的,但也許對某些人來說還是會有幫助 :-) ...

def fetch(self, code: str, year: int, month, retry: int = 5) -> List[Schema_Mongo_Stock_Document]:
    params = {'d': '%d/%d' % (year - 1911, month), 'stkno': code}
    for idx in range(retry):
        try:
            self.acquire()
            r = requests.get(self.TARGET_URL, params=params)
            if r.status_code != 200:
                self.logger.info( f"({idx}) Got code '{r.status_code}': params: {params}" )
                continue

            raw_data = r.json()
            return self.purify( raw_data, code, year, month )

        except ValueError as err:
            self.logger.info( f"({code}), err: {err}" )
            continue

        except ConnectionError as err:
            self.logger.info( f"({code}), err: {err}" )
            continue

        except Exception as err:
            self.logger.warning( f"({code}), err: {err}" )
            continue

        finally:
            self.sleep( self.retrieve_interval )
            self.release()

    self.logger.warn(f" Too many retrys, sleep '{self.failure_interval}' seconds and raise exception.")
    self.sleep( self.failure_interval )

    raise Exception(f"Failed to fetch after '{retry}' times retry.")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants