Cached Historical Data Fetcher

Python utility for fetching any historical data using caching. Suitable for acquiring data that is added frequently and incrementally, e.g. news, posts, weather, etc.

Installation

Install this via pip (or your favourite package manager):

pip install cached-historical-data-fetcher

Features

Uses cache built on top of joblib, lz4 and aiofiles.
Ready to use with asyncio, aiohttp, aiohttp-client-cache. Uses asyncio.gather for fetching chunks in parallel. (For performance reasons, only using aiohttp-client-cache is probably not a good idea when fetching large number of chunks (web requests).)
Based on pandas and supports MultiIndex.

Usage

`HistoricalDataCache`, `HistoricalDataCacheWithChunk` and `HistoricalDataCacheWithFixedChunk`

Override get_one() method to fetch data for one chunk. update() method will call get_one() for each unfetched chunk and concatenate results, then save to cache.

from cached_historical_data_fetcher import HistoricalDataCacheWithFixedChunk
from pandas import DataFrame, Timedelta, Timestamp
from typing import Any

# define cache class
class MyCacheWithFixedChunk(HistoricalDataCacheWithFixedChunk[Timestamp, Timedelta, Any]):
    delay_seconds = 0.0 # delay between chunks (requests) in seconds
    interval = Timedelta(days=1) # interval between chunks, can be any type
    start_index = Timestamp.utcnow().floor("10D") # start index, can be any type

    async def get_one(self, start: Timestamp, *args: Any, **kwargs: Any) -> DataFrame:
        """Fetch data for one chunk."""
        return DataFrame({"day": [start.day]}, index=[start])

# get complete data
print(await MyCacheWithFixedChunk().update())

                           day
2023-09-30 00:00:00+00:00   30
2023-10-01 00:00:00+00:00    1
2023-10-02 00:00:00+00:00    2

See example.ipynb for real-world example.

`IdCacheWithFixedChunk`

Override get_one method to fetch data for one chunk in the same way as in HistoricalDataCacheWithFixedChunk. After updating ids by calling set_ids(), update() method will call get_one() for every unfetched id and concatenate results, then save to cache.

from cached_historical_data_fetcher import IdCacheWithFixedChunk
from pandas import DataFrame
from typing import Any

class MyIdCache(IdCacheWithFixedChunk[str, Any]):
    delay_seconds = 0.0 # delay between chunks (requests) in seconds

    async def get_one(self, start: str, *args: Any, **kwargs: Any) -> DataFrame:
        """Fetch data for one chunk."""
        return DataFrame({"id+hello": [start + "+hello"]}, index=[start])

cache = MyIdCache() # create cache
cache.set_ids(["a"]) # set ids
cache.set_ids(["b"]) # set ids again, now `cache.ids` is ["a", "b"]
print(await cache.update(reload=True)) # discard previous cache and fetch again
cache.set_ids(["b", "c"]) # set ids again, now `cache.ids` is ["a", "b", "c"]
print(await cache.update()) # fetch only new data

       id+hello
    a   a+hello
    b   b+hello
       id+hello
    a   a+hello
    b   b+hello
    c   c+hello

Contributors ✨

Thanks goes to these wonderful people (emoji key):

This project follows the all-contributors specification. Contributions of any kind welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
.github		.github
.idea		.idea
docs		docs
src/cached_historical_data_fetcher		src/cached_historical_data_fetcher
templates		templates
tests		tests
.all-contributorsrc		.all-contributorsrc
.copier-answers.yml		.copier-answers.yml
.editorconfig		.editorconfig
.flake8		.flake8
.gitignore		.gitignore
.gitpod.yml		.gitpod.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
CHANGELOG		CHANGELOG
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
commitlint.config.js		commitlint.config.js
example.ipynb		example.ipynb
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
renovate.json		renovate.json
setup.py		setup.py

License

34j/cached-historical-data-fetcher

Folders and files

Latest commit

History

Repository files navigation

Cached Historical Data Fetcher

Installation

Features

Usage

HistoricalDataCache, HistoricalDataCacheWithChunk and HistoricalDataCacheWithFixedChunk

IdCacheWithFixedChunk

Contributors ✨

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Sponsor this project

Languages

`HistoricalDataCache`, `HistoricalDataCacheWithChunk` and `HistoricalDataCacheWithFixedChunk`

`IdCacheWithFixedChunk`