Interpretable Sequence Classification via Discrete Optimization

This repository contains the code for DISC and the other baselines from the paper Interpretable Sequence Classification via Discrete Optimization by Shvo, Li, Toro Icarte, and McIlraith.

Please cite this paper if you use any part of this code for academic research.

@article{shvo2020interpretable,
title={Interpretable Sequence Classification via Discrete Optimization},
author={Shvo, Maayan and Li, Andrew C and Toro Icarte, Rodrigo and McIlraith, Sheila A},
journal={arXiv preprint arXiv:2010.02819},
year={2020}}

Overview

DISC is an efficient mixed integer linear programming method for learning deterministic finite automata (DFA) from a set of positive (accepted) and negative (rejected) sequences of symbols. Unlike most existing DFA-learning methods, we aim to perform well on real-world sequence data, even when the data is noisy (and not only when the data is generated by a regular language). Three main hyperparameters affect the learning:

q_max: The maximum number of states in the learned DFA.
λ_t: A regularization penalty for (non-self-loop) edge transitions.
λ_a: A regularization penalty for state-occupancy in non-absorbing states (we define states 1 and 2 to be absorbing states which always self-transition, with state 1 as a non-accepting state and state 2 as an accepting state).

This implementation is designed for sequence classification tasks with multiple classes. A single DFA is learned for each class and Bayesian inference is used to produce a probability distribution over possible classes.

Quick-start guide

Running DISC requires Gurobi Optimizer (available at https://www.gurobi.com) and an active license (academic licenses can be obtained for free). You may need to install some minor Python packages via pip.

To run DISC on the Alfred dataset, use the command python3 run_tree_model.py --dataset=alfred --qmax=5 --id=1 (or alternatively, set these parameters in run_tree_model.py:293-298).

For more information on how to run the code, see the Detailed running instructions section below.

Detailed running instructions

DISC is implemented in the files {DFA_utils_tree_minerror, read_traces, run_tree_model, tree_minerror, tree_utils}.py and is run with python3 run_tree_model.py.

For DISC, LSTM (run from lstm.py), and HMM (run from supervised_hmm.py), the file GLOBAL_VARS.py contains important flags. VALIDATION determines whether part of the training data will be used as a validation set in order to choose hyperparameters. TREE_MINERROR, if true, uses DISC when run_tree_model.py is run, and otherwise runs DFA-FT, another baseline in our experiments. MULTILABEL, if true, assumes there can be multiple labels associated with each observation trace.

run_tree_model.py:150-151 contains lists of hyperparameter values for λ_t, λ_a to search over, when looking to find the best model. run_tree_model.py:296 allows you to set the value of q_max.

Several publicly available datasets are provided in the traces/ directory and can be run directly with DISC. We provide 30 randomly split training and testing sets for each dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
scripts		scripts
traces		traces
DFA_utils_tree_minerror.py		DFA_utils_tree_minerror.py
DFA_utils_tree_only.py		DFA_utils_tree_only.py
GLOBAL_VARS.py		GLOBAL_VARS.py
LICENSE		LICENSE
README.md		README.md
read_traces.py		read_traces.py
run_tree_model.py		run_tree_model.py
tree_minerror.py		tree_minerror.py
tree_only.py		tree_only.py
tree_utils.py		tree_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts

scripts

traces

traces

DFA_utils_tree_minerror.py

DFA_utils_tree_minerror.py

DFA_utils_tree_only.py

DFA_utils_tree_only.py

GLOBAL_VARS.py

GLOBAL_VARS.py

LICENSE

LICENSE

README.md

README.md

read_traces.py

read_traces.py

run_tree_model.py

run_tree_model.py

tree_minerror.py

tree_minerror.py

tree_only.py

tree_only.py

tree_utils.py

tree_utils.py

Repository files navigation

Interpretable Sequence Classification via Discrete Optimization

Overview

Quick-start guide

Detailed running instructions

About

Releases

Packages

Languages

License

andrewli77/DISC

Folders and files

Latest commit

History

Repository files navigation

Interpretable Sequence Classification via Discrete Optimization

Overview

Quick-start guide

Detailed running instructions

About

Topics

Resources

License

Stars

Watchers

Forks

Languages