GitHub - kootenpv/shrynk: Using Machine Learning to learn how to Compress :zap:

You can read the introductory blog post or try it live at https://shrynk.ai

Features

✓ Compress your data smartly based on Machine Learning
✓ Takes User Requirements in the form of weights for size, write_time and read_time
✓ Trains & caches a model based on compression methods available in the system, using packaged data
✓ CLI for compressing and decompressing
✓ Works with CSV, JSON and Bytes in general

CLI

shrynk compress myfile.json       # will yield e.g. myfile.json.gz or myfile.json.bz2
shrynk decompress myfile.json.gz  # will yield myfile.json

shrynk compress myfile.csv --size 0 --write 1 --read 0

shrynk benchmark myfile.csv                  # shows benchmark results
shrynk benchmark --predict myfile.csv        # will also show the current prediction
shrynk benchmark --save --predict myfile.csv # will add the result to the training data too

Usage in Docker

To test shrynk out quickly yourself, you can use the official docker image from DockerHub. It is great not to interfere with an existing python installation.

You can also build the image from scratch by going to the docker folder here and doing docker build -t shrynk . and use shrynk instead of kootenpv/shrynk above.

In the following commands, replace ~/Downloads with the folder you want to share with the container (where the file you want to compress is).

# To see help
docker run --rm -v ~/.shrynk:/root/.shrynk -v ~/Downloads:/data kootenpv/shrynk shrynk --help

# To compress a file called train.csv in your ~/Downloads folder
docker run --rm -v ~/.shrynk:/root/.shrynk -v ~/Downloads:/data kootenpv/shrynk \
   shrynk compress /data/train.csv

# To benchmark and predict the train.csv file in your ~/Downloads folder
docker run --rm -v ~/.shrynk:/root/.shrynk -v ~/Downloads:/data kootenpv/shrynk \
   shrynk benchmark --predict /data/train.csv

Usage in Python

Installation:

pip install shrynk

Then in Python:

import pandas as pd
from shrynk import save, load

# save dataframe compressed
my_df = pd.DataFrame({"a": [1]})
file_path = save(my_df, "mypath.csv")
# e.g. mypath.csv.bz2

# load compressed file
loaded_df = load(file_path)

If you just want the prediction, you can also:

import pandas as pd
from shrynk import infer

infer(pd.DataFrame({"a": [1]}))
# {"engine": "csv", "compression": "bz2"}

Add your own data

If you want more control you can do the following:

import pandas as pd
from shrynk import PandasCompressor

df = pd.DataFrame({"a": [1, 2, 3]})

pdc = PandasCompressor("default")
pdc.run_benchmarks(df) # adds data to the default

pdc.train_model(size=3, write=1, read=1)

pdc.predict(df)

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
app		app
data		data
docker		docker
scripts		scripts
shrynk		shrynk
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
README.md		README.md
Shrynk.png		Shrynk.png
conftest.py		conftest.py
deploy.py		deploy.py
setup.cfg		setup.cfg
setup.py		setup.py
shrynk_all.py		shrynk_all.py
tox.ini		tox.ini

kootenpv/shrynk

Folders and files

Latest commit

History

Repository files navigation

Features

CLI

Usage in Docker

Usage in Python

Add your own data

About

Resources

Stars

Watchers

Forks

Languages