(SODDY, TEDDY) & DREAM

This repository implements training data generation algorithms (SODDY & TEDDY) and deep cardinality estimators (DREAM) proposed in our paper "Cardinality Estimation of Approximate Substring Queries using Deep Learning". It is created by Suyong Kwon, Woohwan Jung and Kyuseok Shim.

Repository Overview

It consists of four folders each of which contains its own README file and script.

Folder	Description
gen_train_data	training data generation algorithms
dream	deep cardinality estimators for approximate substring queries
astrid	the modified version of Astrid starting from the astrid model downloaded from [github]
plot	example notebook files

Installation and Requirements

It is recommended to run our code with the CUDA environment. However, the non-CUDA version of our code is also working when the pytorch library does not supper GPU. (You may set CUDA_VISIBLE_DEVICES as -1 to enforce CPU mode.)

Method 1: Use the Docker Image

To run the image needs the NVIDIA Container Toolkit. If you do not have the toolkit, refer to the installation guide

git clone https://github.com/sykwon/teddy-dream.git

# run docker image
docker run -it --gpus all --name dream -v ${PWD}:/workspace -u 1000:1000 sykwon/dream /bin/bash

# after starting docker
redis-server --daemonize yes
cd gen_train_data/
make clean && make && make info
cd ..

Method 2: Create a Virtual Python Environment

This code needs Python-3.7 or higher.

sudo apt-get install -y redis-server git
sudo apt-get install -y binutils
sudo apt-get install -y texlive texlive-latex-extra texlive-fonts-recommended dvipng cm-super

conda create -n py37 python=3.7
source activate py37
conda install -y pytorch=1.7.1 torchvision=0.8.2 cudatoolkit=11.0 -c pytorch -c nvidia

pip install -r requirements.txt

Datasets

DBLP
GENE
WIKI
IMDB

Examples

These commands produces experimental results.

cd gen_train_data
./run.sh DBLP     # to generate training data from the DBLP dataset
# ./run.sh GENE   # to generate training data from the GENE dataset
# ./run.sh WIKI   # to generate training data from the WIKI dataset
# ./run.sh IMDB   # to generate training data from the IMDB dataset
# ./run.sh all    # to generate training data from all datasets
cd ..

cd dream
./run.sh DBLP    # to train all models except Astrid with the DBLP dataset
# ./run.sh GENE  # to train all models except Astrid with the GENE dataset
# ./run.sh WIKI  # to train all models except Astrid with the WIKI dataset
# ./run.sh IMDB  # to train all models except Astrid with the IMDB dataset
# ./run.sh all   # to train all models except Astrid with all datasets
cd ..

cd astrid
./run.sh DBLP    # to train the Astrid model with the DBLP dataset
# ./run.sh GENE  # to train the Astrid model with the GENE dataset
# ./run.sh WIKI  # to train the Astrid model with the WIKI dataset
# ./run.sh IMDB  # to train the Astrid model with the IMDB dataset
# ./run.sh all   # to train the Astrid model with all datasets
cd ..

Please refer to [notebook] to see the experimental results.

Citation

Please consider to cite our paper if you find this code useful:

@article{kwon2022cardinality,
    title={Cardinality estimation of approximate substring queries using deep learning},
    author={Kwon, Suyong and Jung, Woohwan and Shim, Kyuseok},
    journal={Proceddings of the VLDB Endowment},
    volume={15},
    number={11},
    year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
astrid		astrid
dream		dream
gen_train_data		gen_train_data
plot		plot
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

astrid

astrid

dream

dream

gen_train_data

gen_train_data

plot

plot

.gitignore

.gitignore

Dockerfile

Dockerfile

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

(SODDY, TEDDY) & DREAM

Repository Overview

Installation and Requirements

Method 1: Use the Docker Image

Method 2: Create a Virtual Python Environment

Datasets

Examples

Citation

About

Languages

License

sykwon/teddy-dream

Folders and files

Latest commit

History

Repository files navigation

(SODDY, TEDDY) & DREAM

Repository Overview

Installation and Requirements

Method 1: Use the Docker Image

Method 2: Create a Virtual Python Environment

Datasets

Examples

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages