Skip to content

df-diskcache is a Python library for caching pandas.DataFrame objects to local disk.

License

Notifications You must be signed in to change notification settings

thombashi/df-diskcache

Repository files navigation

df-diskcache

Summary

df-diskcache is a Python library for caching pandas.DataFrame objects to local disk.

PyPI package version

Supported Python versions

CI status of Linux/macOS/Windows

Test coverage: coveralls

CodeQL

Installation

pip install df-diskcache

Features

Supports the following methods:

  • get: Get a cache entry (pandas.DataFrame) for the key. Returns None if the key is not found.
  • set: Create a cache entry with an optional time-to-live (TTL) for the key-value pair.
  • update
  • touch: Update the last accessed time of a cache entry to extend the TTL.
  • delete
  • prune: Delete expired cache entries.
  • Dictionary-like operations:
    • __getitem__
    • __setitem__
    • __contains__
    • __delitem__

Usage

Sample Code
import pandas as pd
from dfdiskcache import DataFrameDiskCache

cache = DataFrameDiskCache()
url = "https://raw.githubusercontent.com/pandas-dev/pandas/v2.1.3/pandas/tests/io/data/csv/iris.csv"

df = cache.get(url)
if df is None:
    print("cache miss")
    df = pd.read_csv(url)
    cache.set(url, df)
else:
    print("cache hit")

print(df)

You can also use operations like a dictionary:

Sample Code
import pandas as pd
from dfdiskcache import DataFrameDiskCache

cache = DataFrameDiskCache()
url = "https://raw.githubusercontent.com/pandas-dev/pandas/v2.1.3/pandas/tests/io/data/csv/iris.csv"

df = cache[url]
if df is None:
    print("cache miss")
    df = pd.read_csv(url)
    cache[url] = df
else:
    print("cache hit")

print(df)

Set TTL for cache entries

Sample Code
import pandas as pd
from dfdiskcache import DataFrameDiskCache

DataFrameDiskCache.DEFAULT_TTL = 10  # you can override the default TTL (default: 3600 seconds)

cache = DataFrameDiskCache()
url = "https://raw.githubusercontent.com/pandas-dev/pandas/v2.1.3/pandas/tests/io/data/csv/iris.csv"

df = cache.get(url)
if df is None:
    df = pd.read_csv(url)
    cache.set(url, df, ttl=60)  # you can set a TTL for the key-value pair

print(df)

Dependencies

About

df-diskcache is a Python library for caching pandas.DataFrame objects to local disk.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published