Skip to content

Latest commit

 

History

History

language

CUDA-LDA : Contradistinguisher for Unsupervised Domain Adaptation - Language Domain Adaptation

Paper accepted in ICDM 2019:19th IEEE International Conference on Data Mining, Beijing, China, 8-11 November 2019.

The original code base for the experiments and results for language datasets.

Language Domain Adaptation

We consider Amazon Customer Reviews Dataset with 4 domains Books, DVDs, Electronics and Kitchen Appliances located in data folder. Each domain has 2 classes positive and negative reviews as labels of binary classification.

Installation

You will need:

  • Python 3.6 (Anaconda Python recommended)
  • PyTorch
  • torchvision
  • nltk
  • pandas
  • scipy
  • tqdm
  • scikit-image
  • scikit-learn
  • tensorboardX
  • tensorflow==1.13.1 (for tensorboard visualizations)

PyTorch

On Linux:

> conda install pytorch torchvision cudatoolkit=10.0 -c pytorch

install relevant cuda if GPUs are available. Use of GPUs is very much recommended and inevitable because of the size of the model and datasets.

The rest of the dependencies

Use pip as below:

> pip install -r requirements.txt

Datasets

We consider Amazon Customer Reviews Dataset with 4 domains Books (B) , DVDs (D), Electronics (E) and Kitchen Appliances (K) located in data folder. Each domain has 2 classes positive and negative reviews as labels of binary classification. The processed data of Amazon CUstomer Reviews dataset is obtained from MAN github repo.

Language Domain Adaptation Dataset Statistics

Language_datasets_details.png
Language dataset statistics used for language domain adaptation

Results

Language_datasets_results.png
Target domain test accuracy reported using CUDA over several SOTA domain alignment methods

t-SNE plots which indicate the domain adaptation using CUDA

books_dvd_tsne_epoch059 books_electronics_tsne_epoch052 books_kitchen_tsne_epoch205
B -> D B -> E B -> K
dvd_books_tsne_epoch022 dvd_electronics_tsne_epoch036 dvd_kitchen_tsne_epoch051
D -> B D -> E D -> K
electronics_books_tsne_epoch025 electronics_dvd_tsne_epoch029 electronics_kitchen_tsne_epoch022
E -> B E -> D E -> K
kitchen_books_tsne_epoch033 kitchen_dvd_tsne_epoch015 kitchen_electronics_tsne_epoch028
K -> B K -> D K -> E
  • The t-SNE plots indicates inclined line-like clustering in both Source (x) and Target (+) domain with each class at either ends.

Code and instructions

  • SB_lang_00_ss.py : Code for source supervised only setting
  • SB_lang_00_ss_tu.py : Code for source supervised + target unsupervised only setting
  • SB_lang_00_ss_tu_su.py : Code for source supervised + target unsupervised + source unsupervised setting
  • SB_lang_00_ss_tu_su_sa.py : Code for source supervised + target unsupervised + source unsupervised + source adversarial setting
  • SB_lang_00_ss_tu_su_ta.py : Code for source supervised + target unsupervised + source unsupervised + target adversarial setting
  • SB_lang_00_ss_tu_su_ta_sa.py : Code for source supervised + target unsupervised + source unsupervised + target adversarial + source adversarial setting

cuda.sh : this file consists of commands to run the experiments simultaneously in batch on multiple GPUs.

Folder Information

  • data : This folder is where the datasets are stored.
  • datasets : This folder consists of all the pytorch dataset files used for dataloading.
  • logs : This folder consists of logs from the previous simulations whose results are reported in the paper. The logs are created to store all the settings and parameters for reproducibility.
  • model : This folder consists of all the variants of neural networks used in MAN and CMD.