Name		Name	Last commit message	Last commit date
parent directory ..
data		data
datasets		datasets
git_images		git_images
logs		logs
model		model
README.md		README.md
SB_lang_00_ss.py		SB_lang_00_ss.py
SB_lang_00_ss_tu.py		SB_lang_00_ss_tu.py
SB_lang_00_ss_tu_su.py		SB_lang_00_ss_tu_su.py
SB_lang_00_ss_tu_su_sa.py		SB_lang_00_ss_tu_su_sa.py
SB_lang_00_ss_tu_su_ta.py		SB_lang_00_ss_tu_su_ta.py
SB_lang_00_ss_tu_su_ta_sa.py		SB_lang_00_ss_tu_su_ta_sa.py
SB_lang_00_ts.py		SB_lang_00_ts.py
cmdline_helpers.py		cmdline_helpers.py
cuda.sh		cuda.sh
util.py		util.py

README.md

CUDA-LDA : Contradistinguisher for Unsupervised Domain Adaptation - Language Domain Adaptation

Paper accepted in ICDM 2019：19th IEEE International Conference on Data Mining, Beijing, China, 8-11 November 2019.

The original code base for the experiments and results for language datasets.

Language Domain Adaptation

We consider Amazon Customer Reviews Dataset with 4 domains Books, DVDs, Electronics and Kitchen Appliances located in data folder. Each domain has 2 classes positive and negative reviews as labels of binary classification.

Installation

You will need:

Python 3.6 (Anaconda Python recommended)
PyTorch
torchvision
nltk
pandas
scipy
tqdm
scikit-image
scikit-learn
tensorboardX
tensorflow==1.13.1 (for tensorboard visualizations)

PyTorch

On Linux:

> conda install pytorch torchvision cudatoolkit=10.0 -c pytorch

install relevant cuda if GPUs are available. Use of GPUs is very much recommended and inevitable because of the size of the model and datasets.

The rest of the dependencies

Use pip as below:

> pip install -r requirements.txt

Datasets

We consider Amazon Customer Reviews Dataset with 4 domains Books (B) , DVDs (D), Electronics (E) and Kitchen Appliances (K) located in data folder. Each domain has 2 classes positive and negative reviews as labels of binary classification. The processed data of Amazon CUstomer Reviews dataset is obtained from MAN github repo.

Language Domain Adaptation Dataset Statistics


Language dataset statistics used for language domain adaptation

Results


Target domain test accuracy reported using CUDA over several SOTA domain alignment methods

t-SNE plots which indicate the domain adaptation using CUDA


B -> D	B -> E	B -> K

D -> B	D -> E	D -> K

E -> B	E -> D	E -> K

K -> B	K -> D	K -> E

The t-SNE plots indicates inclined line-like clustering in both Source (x) and Target (+) domain with each class at either ends.

Code and instructions

SB_lang_00_ss.py : Code for source supervised only setting
SB_lang_00_ss_tu.py : Code for source supervised + target unsupervised only setting
SB_lang_00_ss_tu_su.py : Code for source supervised + target unsupervised + source unsupervised setting
SB_lang_00_ss_tu_su_sa.py : Code for source supervised + target unsupervised + source unsupervised + source adversarial setting
SB_lang_00_ss_tu_su_ta.py : Code for source supervised + target unsupervised + source unsupervised + target adversarial setting
SB_lang_00_ss_tu_su_ta_sa.py : Code for source supervised + target unsupervised + source unsupervised + target adversarial + source adversarial setting

cuda.sh : this file consists of commands to run the experiments simultaneously in batch on multiple GPUs.

Folder Information

data : This folder is where the datasets are stored.
datasets : This folder consists of all the pytorch dataset files used for dataloading.
logs : This folder consists of logs from the previous simulations whose results are reported in the paper. The logs are created to store all the settings and parameters for reproducibility.
model : This folder consists of all the variants of neural networks used in MAN and CMD.

Files

language

Directory actions

More options