Skip to content

cvxgrp/cvxbson

Repository files navigation

PyPI version Apache 2.0 License Downloads Coverage Status

IPC

IPC stands for InterProcess Communication. It is a mechanism that allows to share data between processes. A traditional way to do so is to use json files. Json files are rather flexible and can be used to share data between different programming languages. However, they are not very efficient.

Here we use their binary counterpart, bson files. Bson files are much more efficient but somewhat lack the flexibility of json files. Here we rely on the bson package to read and write bson files. We are interested in parsing dictionaries of numpy arrays, pandas and polars dataframes as fast as possible.

There might be faster ways to achieve this goal and we are open to suggestions and pull requests.

We recommend using json files to transfer configurations and small amounts of data. Bson files can then be used to transfer large matrices. A coexistence is possible and encouraged.

Demo

import numpy as np

from cvx.bson import read_bson, write_bson

data = {"A": np.random.rand(50,50), "B": np.random.rand(50)}

write_bson("test.bson", data)
recovered = read_bson("test.bson")

assert np.allclose(data["A"], recovered["A"])
assert np.allclose(data["B"], recovered["B"])

We have also implemented the same functionality in for json files but would advise against using it. It is much slower and less efficient.

You may want to avoid the explicit construction of files. It is possible to work directly with bson strings. We provide methods for that, too.

Poetry

We assume you share already the love for Poetry. Once you have installed poetry you can perform

make install

to replicate the virtual environment we have defined in pyproject.toml and locked in poetry.lock.

Jupyter

We install JupyterLab on fly within the aforementioned virtual environment. Executing

make jupyter

will install and start the jupyter lab.