Skip to content

Using Redis as In-memory database for ML dataset in PyTorch

License

Notifications You must be signed in to change notification settings

jinserk/pytorch-redis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Basic MNIST Example with RedisClient

Tested Python 3.7.7 and 3.8.2 with PyTorch 1.4.0

Redis server installation

install package

$ sudo apt update
$ sudo apt install redis-server

overcommit memory

edit /etc/sysctl.conf to add:

vm.overcommit_memory = 1

and reboot or run the command

$ sudo sysctl vm.overcommit_memory=1

for this to take effect.

transparent hugh pages (THP)

If THP support enabled in your kernal, this will create latency and memory usage issues with Redis. Run the command

# echo never > /sys/kernel/mm/transparent_hugepage/enabled

or add it to your /etc/rc.local in order to retain the settings after a reboot.

disable save to disk

disable the aof

# redis-cli config set appendonly no

disable the rdb

# redis-cli config set save ""
(default was "900 1 300 10 60 10000")

If you want to make these changes effective after restarting redis, using

# redis-cli config rewrite

Run MNIST example

Preparing dataset

$ pip install -r requirements.txt
$ python dataset.py

Training

$ python main.py
# CUDA_VISIBLE_DEVICES=2 python main.py  # to specify GPU id to ex. 2

Comments

  1. There exists RedisLab's official Redis module for PyTorch, but it only supports tensor type to store. Using this project, you can store any structured data associated to a key such as a list of tensors or a list of tuples of tensors mixed with strings etc.
  2. In my experiment, tensor.numpy() has smaller memory footprint than numpy ndarray.
  3. If num_workers=0 in DataLoader, it is inevitably much slower than direct-access of in-memory data. Use multiple num_workers for the performance.

Benchmarks

Env num_workers elapsed time (15 epochs)
torchvision MNIST dataset 1 99.9 secs
RedisClient 4 350.4 secs
RedisClient 8 184.0 secs
RedisClient 16 116.6 secs

About

Using Redis as In-memory database for ML dataset in PyTorch

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages