GlasXC

GlasXC is an implementation of paper Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces - NIPS'19 in Pytorch:

GlasXC: is for solving the extreme classification using simple Deep Learning architecture + GLAS Regularizer

Setup

Prerequisites

Python 2.7 or 3.5
Requirements for the project are listed in requirements.txt. In addition to these, PyTorch 0.4.1 or higher is necessary. The requirements can be installed using pip:
```
$ pip install -r requirements.txt 
```
or using conda:
```
$ conda install --file requirements.txt
```

Installing `extreme_classification`

Clone the repository.

$ git clone https://github.com/pyschedelicsid/GlasXC
$ cd GlasXC

Install
```
$ python setup.py install
```

Using the scripts

GlasXC

Use train_GlasXC.py. A description of the options available can be found using:

$ python train_GlasXC.py --help

This script trains (and optionally evaluates) evaluates a model on a given dataset using the GlasXC algorithm.

Reported Results

To run GlasXC in the configuration used in the report, use:

$ ./train_GlasXC_with_args.sh

To run the baseline model, use:

$ python baseline.py

Links to downloading each dataset used can be found here, and the project report can be found here. The configuration files used (described below) for each dataset can be found here.

Data Format

The input data must be in the LIBSVM format. An example of such a dataset is the Bibtex dataset found here.

The first row in the LIBSVM format specifies dataset size and input and output dimensions. This row must be removed, and this information must be provided through configuration files, as explained below.

Configuration files

Deep Neural Network Architecture Configurations

For using GlasXC through GlasXC.py, you need to have valid neural network configurations for the encoding of the inputs, labels in the latent space and the regressor in the YAML format. An example configuration file is:

- name: Linear
  kwargs:
    in_features: 500
    out_features: 1152

- name: LeakyReLU
  kwargs:
    negative_slope: 0.2
    inplace: True

- name: Linear
  kwargs:
    in_features: 1152
    out_features: 1836

- name: Sigmoid

Please note that the name and kwargs attributes have to resemble the same names as those in PyTorch.

Optimizer Configurations

Optimizer configurations are very similar to the neural network configurations. Here you have to retain the same naming as PyTorch for optimizer names and their parameters - for example: lr for learning rate. Below is a sample:

name: SGD
args:
  lr: 0.01
  momentum: 0.9

Dataset Configurations

In both the scripts, you are required to specify a data root (data_root), dataset information file (dataset_info). data_root corresponds to the folder containing the datasets. dataset_info requires a YAML file in the following format:

train_filename:
train_opts:
  num_data_points:
  input_dims:
  output_dims:

test_filename:
test_opts:
  num_data_points:
  input_dims:
  output_dims:

If the test dataset doesn't exist, then please remove the fields test_filename and test_opts. An example for the Bibtex dataset would be:

train_filename: bibtex_train.txt
train_opts:
  num_data_points: 4880
  input_dims: 1836
  output_dims: 159

test_filename: bibtex_test.txt
test_opts:
  num_data_points: 2515
  input_dims: 1836
  output_dims: 159

Written By: Siddhant Katyan

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
GlasXC.egg-info		GlasXC.egg-info
XMC		XMC
__pycache__		__pycache__
build/lib/XMC		build/lib/XMC
dist		dist
pyxclib-master		pyxclib-master
results		results
setups		setups
.gitignore		.gitignore
GlasXC.sh		GlasXC.sh
LICENSE		LICENSE
MIPS.py		MIPS.py
README.md		README.md
evaluation.py		evaluation.py
hello.py		hello.py
helloworld.sh		helloworld.sh
prec_plots.png		prec_plots.png
preproc.py		preproc.py
pyxclib-master.zip		pyxclib-master.zip
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
slurm-83691.out		slurm-83691.out
split_maker_small_dataset.sh		split_maker_small_dataset.sh
tmp_amazonCat-13K		tmp_amazonCat-13K
tmp_results		tmp_results
tmp_results_1000		tmp_results_1000
tmp_results_1lyaer_bn_relu		tmp_results_1lyaer_bn_relu
tmp_results_1lyaer_relu		tmp_results_1lyaer_relu
tmp_results_1lyaer_relu_bn		tmp_results_1lyaer_relu_bn
tmp_results_bce_with_logits		tmp_results_bce_with_logits
tmp_results_bz_1024		tmp_results_bz_1024
tmp_results_dropout_0.2		tmp_results_dropout_0.2
tmp_results_emb_normalization		tmp_results_emb_normalization
tmp_results_margin_loss		tmp_results_margin_loss
tmp_results_no_reg		tmp_results_no_reg
tmp_results_rho_0.2		tmp_results_rho_0.2
tmp_results_snm		tmp_results_snm
tmp_results_snm_100		tmp_results_snm_100
train_GlasXC.py		train_GlasXC.py
train_GlasXC_with_args.sh		train_GlasXC_with_args.sh

License

Stomach-ache/GLaS

Folders and files

Latest commit

History

Repository files navigation

Setup

Prerequisites

Installing extreme_classification

Using the scripts

GlasXC

Reported Results

Data Format

Configuration files

Deep Neural Network Architecture Configurations

Optimizer Configurations

Dataset Configurations

GLaS

About

Resources

License

Stars

Watchers

Forks

Languages

Installing `extreme_classification`