SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair

SequenceR is a seq2seq model designed to predict bug fixes at the line level. The paper (doi:10.1109/TSE.2019.2940179) explains the approach.

If you use SequenceR for academic purposes, please cite the following publication:

@article{chen2018sequencer,
  title={SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair},
  author={Chen, Zimin and Kommrusch, Steve and Tufano, Michele and Pouchet, Louis-No{\"e}l and Poshyvanyk, Denys and Monperrus, Martin},
  journal={IEEE Transaction on Software Engineering},
  year={2019}
}

Usage

Docker

Simply run the following two commands to set up use of the SequenceR Golden model:

docker build --tag=sequencer .
docker run -it sequencer

And now all dependecies are installed (including defects4j).

Or, use our this version from the Docker Hub.

Without docker

Install dependencies

First run src/setup_env.sh to setup enviroment and clone/compile project. Please view src/setup_env.sh for more details.

All models are versioned using git-lfs, make sure to configure it and correctly fetch the models before using.

Execution

Then run src/sequencer-predict.sh with the following parameters:

./sequencer-predict.sh --model=[model path] --buggy_file=[abs path] --buggy_line=[int] --beam_size=[int] --output=[abs path]

--model: Absolute path to the model
--buggy_file: Absolute path to buggy file
--buggy_line: Line number indicating where the bug is, or just want it get changed.
--beam_size: Beam size for prediction
--output: Output directory to store the generated patches

Experiments

CodRep experiment

The training data consists of results/Golden/src-train.txt and results/Golden/tgt-train.txt (line to line correspondence).

The CodRep4 testing data consists of results/Golden/src-test.txt and results/Golden/tgt-test.txt (line to line correspondence).

Defects4J experiment

In results/Defects4J_patches you can find all patches that are found by SequencerR. Patches that are stored in *_compiled are patches that compiled. Patches that are stored in *_passed are patches that compiled and passed the test suite. Patches that are stored in *_correct are patches that compiled, passed the test suite and are equivalent to the human patch.

To rerun our experiment of SequenceR over Defects4J. Run src/Defects4J_Experiment/Defects4J_experiment.sh, make sure you have defects4j installed.

Defects4J_oneLiner_metadata.csv contains metadata for all Defects4J bugs that we consider. src/Defects4J_Experiment/validatePatch.py contains the precedure for running Defects4J test, we have time limit on compile time (60s) and test running time (300s).

Model creation, training and use:

Prerequisites

SequenceR uses the OpenNMT library to set up program repair as a translation from buggy code to fixed code. Documentation on OpenNMT including parameter setup is at http://opennmt.net/OpenNMT-py/

Setup

Choose a directory and:

git clone https://github.com/OpenNMT/OpenNMT-py

When testing a new configuration, copy a working data directory and modify *sh files as desired.

Set up environment variables:

export CUDA_VISIBLE_DEVICES=0
export THC_CACHING_ALLOCATOR=0
export OpenNMT_py=.../OpenNMT-py
export data_path=.../results/Golden  # Or a new directory path as desired

Train

For details on model training, refer to OpenNMT documentation. To run SequenceR training:

cd src
sequencer-train.sh

Test

For details on model usage (translation), refer to OpenNMT documentation. To run SequenceR testing:

cd src
sequencer-test.sh

License

The code and data in this repository are under the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
data		data
model		model
results		results
src		src
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

model

model

results

results

src

src

.dockerignore

.dockerignore

.gitattributes

.gitattributes

.gitignore

.gitignore

Dockerfile

Dockerfile

README.md

README.md

Repository files navigation

SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair

Usage

Docker

Without docker

Experiments

CodRep experiment

Defects4J experiment

Model creation, training and use:

Prerequisites

Setup

Train

Test

License

About

Contributors 9

Languages

ASSERT-KTH/sequencer

Folders and files

Latest commit

History

Repository files navigation

SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair

Usage

Docker

Without docker

Experiments

CodRep experiment

Defects4J experiment

Model creation, training and use:

Prerequisites

Setup

Train

Test

License

About

Resources

Stars

Watchers

Forks

Languages