WEDTM

The code for the paper "Inter and Intra Topic Structure Learning with Word Embeddings" in ICML 2018 PDF.

Key features:

WEDTM is a deep topic model that discovers topic hierarchies.
WEDTM is also able to discover "sub-topics" with the help of word embeddings.
Excellent performance on perplexity, document classification, and topic coherence.

Run WEDTM

The code has been tested in MacOS and Linux (Ubuntu). To run it on Windows, you need to re-compile GNBP_mex_collapsed_deep_WEDTM.c with MEX and a C++ complier.
Requirements: Matlab 2016b (or later) and the code of GBN.
Make sure GBN runs properly on your machine.
We have offered the WS dataset used in the paper, which is stored in MAT format, with the following contents:

doc: a V by N count (sparse) matrix for N documents with V words in the vocabulary
embeddings: a V by L matrix for the L dimensional word embeddings for V words
vocabulary: the words in the vocabulary
labels: the label matrix for the documents (only for document classification)
label_names: the label names (only for document classification)
train_idx: the indexes of documents for training (only for document classification)
test_idx: the indexes of documents for testing (only for document classification)

Please prepare your own documents in the above format. If you want to use this dataset, please cite the original papers, which are cited in our paper.

Run demo_WEDTM.m:

Specify where the GBN code is installed and some model parameters.
Follow the comments and run it.
The code should yield the results reported in the paper.
I've found that if you use more MCMC iterations, the model will have better performance than reported in the paper.😂

Notes

As WEDTM adapts GBN for a part of its model structure, the code heavily relies on GBN and basically follows the code structure of GBN.
For the Polya-Gamma sampler (PolyaGamRnd_Gam.m), I used Mingyuan Zhou's implementation, described in "Parsimonious Bayesian deep networks". If you want to use the sampler, please cite the paper.
For the sampling of W, I partly referred to the implementation of DPFA by Gan Zhe.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
GNBP_mex_collapsed_deep_WEDTM.c		GNBP_mex_collapsed_deep_WEDTM.c
GNBP_mex_collapsed_deep_WEDTM.mexa64		GNBP_mex_collapsed_deep_WEDTM.mexa64
GNBP_mex_collapsed_deep_WEDTM.mexmaci64		GNBP_mex_collapsed_deep_WEDTM.mexmaci64
LICENSE		LICENSE
PolyaGamRnd_Gam.m		PolyaGamRnd_Gam.m
README.md		README.md
TrimTcurrent_WEDTM.m		TrimTcurrent_WEDTM.m
WEDTM.m		WEDTM.m
choll.m		choll.m
demo_WEDTM.m		demo_WEDTM.m
init_beta.m		init_beta.m
logOnePlusExp.m		logOnePlusExp.m
point_estimate_theta.m		point_estimate_theta.m
sample_beta.m		sample_beta.m
show_sub_topics.m		show_sub_topics.m

License

ethanhezhao/WEDTM

Folders and files

Latest commit

History

Repository files navigation

WEDTM

Run WEDTM

Notes

About

Topics

Resources

License

Stars

Watchers

Forks

Languages