Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
dataset		dataset
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Run_Experiment_Supervised.ipynb		Run_Experiment_Supervised.ipynb
Run_Experiment_Unsupervised.ipynb		Run_Experiment_Unsupervised.ipynb
VDSH.py		VDSH.py
VDSH_S.py		VDSH_S.py
VDSH_SP.py		VDSH_SP.py
rank_metrics.py		rank_metrics.py
utils.py		utils.py

Repository files navigation

Variational Deep Semantic Hashing (SIGIR'2017)

The implementation of the models and experiments of Variational Deep Semantic Hashing (SIGIR 2017).

Author: Suthee Chaidaroon

Platform

This project uses python 2.7 and Tensorflow version 1.3

Prepare dataset

The model expects the input document to be in a bag-of-words format. I provided sample dataset under dataset directory. If you want to use a new text collection, the input document collection to our model should be a matrix where each row represents one document and each column represents one unique word in the corpus.

To get the best performance

TFIDF turns out to be the best representation for our models according to our empirical results.

Training the model

The component collapsing is common in variational autoencoder framework where the KL regularizer shuts off some latent dimensions (by setting the weights to zero). We use weight annealing technique [1] to mitigate this issue during the training.

References

[1] https://arxiv.org/abs/1602.02282

Bibtex

@inproceedings{Chaidaroon:2017:VDS:3077136.3080816,
 author = {Chaidaroon, Suthee and Fang, Yi},
 title = {Variational Deep Semantic Hashing for Text Documents},
 booktitle = {Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval},
 series = {SIGIR '17},
 year = {2017},
 isbn = {978-1-4503-5022-8},
 location = {Shinjuku, Tokyo, Japan},
 pages = {75--84},
 numpages = {10},
 url = {http://doi.acm.org/10.1145/3077136.3080816},
 doi = {10.1145/3077136.3080816},
 acmid = {3080816},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {deep learning, semantic hashing, variational autoencoder},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset

dataset

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Run_Experiment_Supervised.ipynb

Run_Experiment_Supervised.ipynb

Run_Experiment_Unsupervised.ipynb

Run_Experiment_Unsupervised.ipynb

VDSH.py

VDSH.py

VDSH_S.py

VDSH_S.py

VDSH_SP.py

VDSH_SP.py

rank_metrics.py

rank_metrics.py

utils.py

utils.py

Repository files navigation

Variational Deep Semantic Hashing (SIGIR'2017)

Platform

Prepare dataset

To get the best performance

Training the model

References

Bibtex

About

Releases

Packages

Languages

License

unsuthee/VariationalDeepSemanticHashing

Folders and files

Latest commit

History

Repository files navigation

Variational Deep Semantic Hashing (SIGIR'2017)

Platform

Prepare dataset

To get the best performance

Training the model

References

Bibtex

About

Topics

Resources

License

Stars

Watchers

Forks

Languages