Skip to content

airalcorn2/deformer

Repository files navigation

The DEformer

This is the repository for the paper:

Michael A. Alcorn and Anh Nguyen. The DEformer: An Order-Agnostic Distribution Estimating Transformer. ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models (INNF+). 2021.

By including each feature's identity alongside its value in the input, sequential models can be used to perform order-agnostic autoregressive distribution estimation. Our DEformer uses an interleaved input design (as partially depicted here with the self-attention mask) for this task. The two sets of interleaved feature vectors consist of pixel identity feature vectors (zk) and pixel identity/value feature vectors (uk). rk and ck are the row and column for the pixel indexed by k in the permuted sequence, respectively, vk is the value of the pixel (which is zero or one for binary images), and gz and gu are multilayer perceptrons.
Samples generated by the DEformer. Each sample was generated using a random pixel order.
Because the DEformer is order-agnostic, it can easily "fill in" images where pixels are missing in a variety of patterns by placing the missing pixels at the end of the input sequence. Here, each row corresponds to a different ground truth image from the test set (depicted in the first column). The remaining pairs of columns show 100 removed pixels (red) from the ground truth image and the corresponding filled in image.

Citation

If you use this code for your own research, please cite:

@article{alcorn2021deformer,
   title={The DEformer: An Order-Agnostic Distribution Estimating Transformer},
   author={Alcorn, Michael A. and Nguyen, Anh},
   journal={ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models (INNF+)},
   year={2021}
}

Training the DEformer

Setting up .deformer_profile

After you've cloned the repository to your desired location, create a file called .deformer_profile in your home directory:

nano ~/.deformer_profile

and copy and paste in the contents of .deformer_profile, replacing each of the variable values with paths relevant to your environment. Next, add the following line to the end of your ~/.bashrc:

source ~/.deformer_profile

and either log out and log back in again or run:

source ~/.bashrc

You should now be able to copy and paste all of the commands in the various instructions sections. For example:

echo ${DEFORMER_PROJECT_DIR}

should print the path you set for DEFORMER_PROJECT_DIR in .deformer_profile.

Running the binarized-MNIST training script

Run (or copy and paste) the following script, editing the variables as appropriate.

#!/usr/bin/env bash

JOB=$(date +%Y%m%d%H%M%S)

echo "train:" >> ${JOB}.yaml
echo "  dataset: mnist" >> ${JOB}.yaml  # "mnist" or "cifar10".
echo "  train_prop: 0.98" >> ${JOB}.yaml
echo "  workers: 10" >> ${JOB}.yaml
echo "  learning_rate: 1.0e-5" >> ${JOB}.yaml
echo "  patience: 5" >> ${JOB}.yaml

echo "model:" >> ${JOB}.yaml
echo "  mlp_layers: [128, 256, 512]" >> ${JOB}.yaml
echo "  nhead: 8" >> ${JOB}.yaml
echo "  dim_feedforward: 2048" >> ${JOB}.yaml
echo "  num_layers: 6" >> ${JOB}.yaml
echo "  dropout: 0.0" >> ${JOB}.yaml

# Save experiment settings.
mkdir -p ${DEFORMER_EXPERIMENTS_DIR}/${JOB}
mv ${JOB}.yaml ${DEFORMER_EXPERIMENTS_DIR}/${JOB}/

gpu=0
cd ${DEFORMER_PROJECT_DIR}
nohup python3 train_deformer.py ${JOB} ${gpu} > ${DEFORMER_EXPERIMENTS_DIR}/${JOB}/train.log &

Running the POWER training script

Run (or copy and paste) the following script, editing the variables as appropriate.

#!/usr/bin/env bash

JOB=$(date +%Y%m%d%H%M%S)

echo "train:" >> ${JOB}.yaml
echo "  dataset: power" >> ${JOB}.yaml  # "gas" or "power".
echo "  batch_size: 128" >> ${JOB}.yaml
echo "  workers: 10" >> ${JOB}.yaml
echo "  learning_rate: 1.0e-5" >> ${JOB}.yaml
echo "  patience: 20" >> ${JOB}.yaml

echo "model:" >> ${JOB}.yaml
echo "  idx_embed_dim: 20" >> ${JOB}.yaml
echo "  mix_comps: 150" >> ${JOB}.yaml
echo "  mlp_layers: [128, 256, 512]" >> ${JOB}.yaml
echo "  nhead: 8" >> ${JOB}.yaml
echo "  dim_feedforward: 2048" >> ${JOB}.yaml
echo "  num_layers: 6" >> ${JOB}.yaml
echo "  dropout: 0.2" >> ${JOB}.yaml

# Save experiment settings.
mkdir -p ${DEFORMER_EXPERIMENTS_DIR}/${JOB}
mv ${JOB}.yaml ${DEFORMER_EXPERIMENTS_DIR}/${JOB}/

gpu=0
cd ${DEFORMER_PROJECT_DIR}
nohup python3 train_deformer_tabular.py ${JOB} ${gpu} > ${DEFORMER_EXPERIMENTS_DIR}/${JOB}/train.log &

Running the POWER ARDM training script

Run (or copy and paste) the following script, editing the variables as appropriate. This script trains an order-agnostic DEformer similar to the order-agnostic Transformer described in Appendix D of "Autoregressive Diffusion Models". The only difference between this model and the original DEformer is that each input in the sequence consists of the concatenation of the column embedding for the value being predicted with the column embedding and value for the previous column in the shuffled sequence, i.e., the length of the input sequence is no longer double the number of columns. This model achieves a negative log-likelihood of -0.62 (compared to -0.68 for the original DEformer).

#!/usr/bin/env bash

JOB=$(date +%Y%m%d%H%M%S)

echo "train:" >> ${JOB}.yaml
echo "  dataset: power" >> ${JOB}.yaml  # "gas" or "power".
echo "  batch_size: 128" >> ${JOB}.yaml
echo "  workers: 10" >> ${JOB}.yaml
echo "  learning_rate: 1.0e-5" >> ${JOB}.yaml
echo "  patience: 20" >> ${JOB}.yaml

echo "model:" >> ${JOB}.yaml
echo "  idx_embed_dim: 20" >> ${JOB}.yaml
echo "  mix_comps: 150" >> ${JOB}.yaml
echo "  mlp_layers: [128, 256, 512]" >> ${JOB}.yaml
echo "  nhead: 8" >> ${JOB}.yaml
echo "  dim_feedforward: 2048" >> ${JOB}.yaml
echo "  num_layers: 6" >> ${JOB}.yaml
echo "  dropout: 0.2" >> ${JOB}.yaml

# Save experiment settings.
mkdir -p ${DEFORMER_EXPERIMENTS_DIR}/${JOB}
mv ${JOB}.yaml ${DEFORMER_EXPERIMENTS_DIR}/${JOB}/

gpu=0
cd ${DEFORMER_PROJECT_DIR}
nohup python3 train_deformer_tabular_ardm.py ${JOB} ${gpu} > ${DEFORMER_EXPERIMENTS_DIR}/${JOB}/train.log &

Running the CSDI training script

Run (or copy and paste) the following script. This script trains a DEformer-like model (hereafter "DEformer-CSDI") on the imputation task described in "CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation"; specifically, using a variation of the 10% missing healthcare dataset described in the paper. While the test set is identical to the one in CSDI (because I used the paper's code), I changed the training/validation split to 95%/5% and I used an online strategy to generate missing values for each training sample. Specifically, every time a training sample was encountered, I randomly selected 10% of the observed values to serve as the missing values.

Like the DEformer, the input for DEformer-CSDI consists of a mix of identity feature vectors and identity/value feature vectors. The difference in this case is that DEformer-CSDI is not learning the joint distribution, so only the identity feature vectors are included for the missing values and the attention mask is now full instead of lower triangular (i.e., every input can attend to every other input). Identity was encoded as f(t, k) = [t, embed(k)] where t and k are the time and feature indices, respectively, for a data point. One interesting difference between DEformer-CSDI and CSDI is that DEformer-CSDI simply ignores missing values that are not being predicted, while CSDI "fills in" missing values with zeros to fix the size of the input.

With no hyperparameter tuning, DEformer-CSDI achieves a mean absolute error of 0.216 on the 10% missing healthcare dataset compared to 0.217 for CSDI (see Table 3 in the paper). Notably, DEformer-CSDI vastly outperforms the flattened Transformer baseline discussed in Appendix F, which achieved a mean absolute error of 0.383 (see Table 7).

#!/usr/bin/env bash

cd ${DEFORMER_PROJECT_DIR}
nohup python3 train_deformer_csdi.py > csdi.log &

About

An order-agnostic distribution estimating Transformer.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published