NeuroCard

NeuroCard is a neural cardinality estimator for multi-table join queries.

NeuroCard's philosophy is to learn as much correlation as possible across tables, thereby achieving high accuracy.

Technical details can be found in the VLDB 2021 paper, NeuroCard: One Cardinality Estimator for All Tables [bibtex].

Quick start | Main modules | Running experiments | Contributors | Citation

Quick start

Set up a conda environment with depedencies installed:

# On Ubuntu/Debian
sudo apt install build-essential
# Install Python environment
conda env create -f environment.yml
conda activate neurocard
# Run commands below inside this directory.
cd neurocard

Download the IMDB dataset as CSV files and place under datasets/job:

# Download size 1.2GB.
bash scripts/download_imdb.sh

# If you already have the CSVs or can export from a
# database, simply link to an existing directory.
# ln -s <existing_dir_with_csvs> datasets/job
# Run the following if the existing CSVs are without headers.
# python scripts/prepend_imdb_headers.py

Launch a short test run:

python run.py --run test-job-light

Main modules

Module	Description
run	Main script to train and evaluate
experiments	Registry of experiment configurations
common	Abstractions for columns, tables, joined relations; column factorization
factorized_sampler	Unbiased join sampler
estimators	Cardinality estimators: probabilistic inference for density models; inference for column factorization
datasets	Registry of datasets and schemas
Models: made, transformer	Deep autoregressive models: ResMADE & Transformer

Running experiments

Launch training and evaluation using a single script:

# 'name' is a config registered in experiments.py.
python run.py --run <name>

Registered configs. Hyperparameters are statically declared in experiments.py. New experiments (e.g., changing query files; running hparam tuning) can be specified there.

Configs for evaluation on pretrained checkpoints and full training runs:

Benchmark	Config (reload pretrained ckpt)	Config (re-train)	Model	Num Params
JOB-light	`job-light-reload`	`job-light`	ResMADE	1.0M
JOB-light-ranges	`job-light-ranges-reload`	`job-light-ranges`	ResMADE	1.1M
	`job-light-ranges-large-reload`	`job-light-ranges-large`	Transformer	5.4M
JOB-M	`job-m-reload`	`job-m`	ResMADE	7.2M
	-	`job-m-large` (launch with `--gpus=4` or lower the batch size)	Transformer	107M

The reload configs load pretrained checkpoints and run evaluation only. Normal configs start training afresh and also run evaluation.

Metrics & Monitoring. The key metrics to track are

Cardinality estimation accuracy (Q-errors): fact_psample_<num_psamples>_<quantile>
Quality of the density model: train_bits (negative log-likelihood in bits-per-tuple; lower is better).

The standard output prints these metrics and can be piped into a log file. If TensorBoard is installed, use the following to visualize:

python -m tensorboard.main --logdir ~/ray_results/

Contributors

This repo was written by

Citation

@article{neurocard,
  title={NeuroCard: One Cardinality Estimator for All Tables},
  author={Yang, Zongheng and Kamsetty, Amog and Luan, Sifei and Liang, Eric and Duan, Yan and Chen, Xi and Stoica, Ion},
  journal={arXiv preprint arXiv:2006.08109},
  year={2020}
}

Related projects. NeuroCard builds on top of Naru and Variable Skipping.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
neurocard		neurocard
.gitignore		.gitignore
.style.yapf		.style.yapf
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
format.sh		format.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

neurocard

neurocard

.gitignore

.gitignore

.style.yapf

.style.yapf

LICENSE

LICENSE

README.md

README.md

environment.yml

environment.yml

format.sh

format.sh

Repository files navigation

NeuroCard

Quick start

Main modules

Running experiments

Contributors

Citation

About

Releases

Packages

Languages

License

naru-project/neurocard

Folders and files

Latest commit

History

Repository files navigation

NeuroCard

Quick start

Main modules

Running experiments

Contributors

Citation

About

Resources

License

Stars

Watchers

Forks

Languages