df-diskcache
df-diskcache
is a Python library for caching pandas.DataFrame
objects to local disk.
pip install df-diskcache
Supports the following methods:
get
: Get a cache entry (pandas.DataFrame
) for the key. ReturnsNone
if the key is not found.set
: Create a cache entry with an optional time-to-live (TTL) for the key-value pair.update
touch
: Update the last accessed time of a cache entry to extend the TTL.delete
prune
: Delete expired cache entries.- Dictionary-like operations:
__getitem__
__setitem__
__contains__
__delitem__
- Sample Code
import pandas as pd from dfdiskcache import DataFrameDiskCache cache = DataFrameDiskCache() url = "https://raw.githubusercontent.com/pandas-dev/pandas/v2.1.3/pandas/tests/io/data/csv/iris.csv" df = cache.get(url) if df is None: print("cache miss") df = pd.read_csv(url) cache.set(url, df) else: print("cache hit") print(df)
You can also use operations like a dictionary:
- Sample Code
import pandas as pd from dfdiskcache import DataFrameDiskCache cache = DataFrameDiskCache() url = "https://raw.githubusercontent.com/pandas-dev/pandas/v2.1.3/pandas/tests/io/data/csv/iris.csv" df = cache[url] if df is None: print("cache miss") df = pd.read_csv(url) cache[url] = df else: print("cache hit") print(df)
- Sample Code
import pandas as pd from dfdiskcache import DataFrameDiskCache DataFrameDiskCache.DEFAULT_TTL = 10 # you can override the default TTL (default: 3600 seconds) cache = DataFrameDiskCache() url = "https://raw.githubusercontent.com/pandas-dev/pandas/v2.1.3/pandas/tests/io/data/csv/iris.csv" df = cache.get(url) if df is None: df = pd.read_csv(url) cache.set(url, df, ttl=60) # you can set a TTL for the key-value pair print(df)