Bench4BL

Bench4BL is a collection of set of bug reports and git repository for Fault Localization experiment. This collection has 10,017 bug reports collected from 51 Subjects and each bug report is mapped to the source code of the corresponding version. Therefore, we can support your detail experiment as giving this version mapping information. And we also support you that you can experiment with other Subjects as offering scripts used to make this collection. This document explains how to use this benchmark to your experiment and how reproduce the result of our paper titled IR-based Bug Localization: Reproducibility Study on the Performance of State-of-the-Art Approaches.

@inproceedings{bench4bl,
  Author = {Jaekwon Lee and Dongsun Kim and Tegawend\'e F. Bissyand\'e and Woosung Jung and Yves Le Traon},
  Title = {IR-based Bug Localization: Reproducibility Study on the Performance of State-of-the-Art Approaches},
  Booktitle = {Proceedings of the 27th ACM SIGSOFT International Symposium  on  Software Testing and Analysis},
  Series = {ISSTA 2018},
  Year = {2018},
  doi = {10.1145/3213846.3213856},
  pages = {1--12}
}

Subjects ( Bug reports and Source Code Repositories )

The below table shows 5 old subjects that used in previous studies and 46 new subjects that we collected more. The subjects classified into 6 groups to manage them (The Previous group is old subjects). Each of the archive contains bug reports, bug report repositories that we refined, cloned git repository and metadata of them that we made. If you need a recent git repository, please clone again through a link in Git Repository column. You can use this data sets following a section "Getting Started"

Group	Subject	Archive	Git Repository
Apache	CAMEL	CAMEL.tar	https://github.com/apache/camel.git
Apache	HBASE	HBASE.tar	https://github.com/apache/hbase.git
Apache	HIVE	HIVE.tar	https://github.com/apache/hive.git
Commons	CODEC	CODEC.tar	https://github.com/apache/commons-codec.git
Commons	COLLECTIONS	COLLECTIONS.tar	https://github.com/apache/commons-collections.git
Commons	COMPRESS	COMPRESS.tar	https://github.com/apache/commons-compress.git
Commons	CONFIGURATION	CONFIGURATION.tar	https://github.com/apache/commons-configuration.git
Commons	CRYPTO	CRYPTO.tar	https://github.com/apache/commons-crypto.git
Commons	CSV	CSV.tar	https://github.com/apache/commons-csv.git
Commons	IO	IO.tar	https://github.com/apache/commons-io.git
Commons	LANG	LANG.tar	https://github.com/apache/commons-lang.git
Commons	MATH	MATH.tar	https://github.com/apache/commons-math.git
Commons	WEAVER	WEAVER.tar	https://github.com/apache/commons-weaver.git
JBoss	ENTESB	ENTESB.tar	https://github.com/jboss-fuse/fuse.git
JBoss	JBMETA	JBMETA.tar	https://github.com/jboss/metadata.git
Wildfly	ELY	ELY.tar	https://github.com/wildfly-security/wildfly-elytron.git
Wildfly	SWARM	SWARM.tar	https://github.com/wildfly-swarm/wildfly-swarm.git
Wildfly	WFARQ	WFARQ.tar	https://github.com/wildfly/wildfly-arquillian.git
Wildfly	WFCORE	WFCORE.tar	https://github.com/wildfly/wildfly-core.git
Wildfly	WFLY	WFLY.tar	https://github.com/wildfly/wildfly.git
Wildfly	WFMP	WFMP.tar	https://github.com/wildfly/wildfly-maven-plugin.git
Spring	AMQP	AMQP.tar	https://github.com/spring-projects/spring-amqp
Spring	ANDROID	ANDROID.tar	https://github.com/spring-projects/spring-android
Spring	BATCH	BATCH.tar	https://github.com/spring-projects/spring-batch
Spring	BATCHADM	BATCHADM.tar	https://github.com/spring-projects/spring-batch-admin
Spring	DATACMNS	DATACMNS.tar	https://github.com/spring-projects/spring-data-commons
Spring	DATAGRAPH	DATAGRAPH.tar	https://github.com/spring-projects/spring-data-neo4j
Spring	DATAJPA	DATAJPA.tar	https://github.com/spring-projects/spring-data-jpa
Spring	DATAMONGO	DATAMONGO.tar	https://github.com/spring-projects/spring-data-mongodb
Spring	DATAREDIS	DATAREDIS.tar	https://github.com/spring-projects/spring-data-redis
Spring	DATAREST	DATAREST.tar	https://github.com/spring-projects/spring-data-rest
Spring	LDAP	LDAP.tar	https://github.com/spring-projects/spring-ldap
Spring	MOBILE	MOBILE.tar	https://github.com/spring-projects/spring-mobile
Spring	ROO	ROO.tar	https://github.com/spring-projects/spring-roo
Spring	SEC	SEC.tar	https://github.com/spring-projects/spring-security
Spring	SECOAUTH	SECOAUTH.tar	https://github.com/spring-projects/spring-security-oauth
Spring	SGF	SGF.tar	https://github.com/spring-projects/spring-data-gemfire
Spring	SHDP	SHDP.tar	https://github.com/spring-projects/spring-hadoop
Spring	SHL	SHL.tar	https://github.com/spring-projects/spring-shell
Spring	SOCIAL	SOCIAL.tar	https://github.com/spring-projects/spring-social
Spring	SOCIALFB	SOCIALFB.tar	https://github.com/spring-projects/spring-social-facebook
Spring	SOCIALLI	SOCIALLI.tar	https://github.com/spring-projects/spring-social-linkedin
Spring	SOCIALTW	SOCIALTW.tar	https://github.com/spring-projects/spring-social-twitter
Spring	SPR	SPR.tar	https://github.com/spring-projects/spring-framework
Spring	SWF	SWF.tar	https://github.com/spring-projects/spring-webflow
Spring	SWS	SWS.tar	https://github.com/spring-projects/spring-ws
Previous	AspectJ	AspectJ.tar	https://github.com/eclipse/org.aspectj
Previous	JDT	JDT.tar	https://github.com/eclipse/eclipse.jdt.core
Previous	PDE	PDE.tar	https://github.com/eclipse/eclipse.pde.ui
Previous	SWT	SWT.tar	https://github.com/eclipse/eclipse.platform.swt
Previous	ZXing	ZXing.tar	https://github.com/zxing/zxing

Repository Directory Structure

techniques: This folder includes source codes and executable files of previous techniques such as BugLocator and Locus. We revised the source codes to output result with same format and improved their performance. All executable files are stored in a folder "techniques/releases"
analysis: The execution result of previous techniques which is refind for scripts in forlder "scripts > analysis".
scripts: Python scripts to prepare resources for Fault Localization experiment and execute previous techniques and organize the results.
packing.sh: Shell script to pack resource data per each subject.
unpacking.sh: Shell script to unpack resource data per each subject.

Getting Started

This section describes all procedures of use this benchmarks. The procedures include setting experiment environment, creating bug repository and checking out source codes of specific versions. The step of creating bug repository can be skipped when you use archives that you downloaded from the above table. All the commands are written base on Ubuntu 16.04 LTS because all the experiments are executed in this environment. We describe scripts folder briefly and list up each step of all procedures.

## Scripts Directory Structure ##
- repository: Scripts to prepare the resources to execute each technique.
- results: Scripts to collect the execution results of each technique and export to Excel.
- analysis: Scripts to analysis for the result of each technique and features extracted from resources. <br /> 
             We applied Mann-Whitney U test, Pearson correlation and so on.
- commons: Scripts to managing subjects and common functions.
- utils: Personal libraries for experiments.

Clone this repository

Clone the repository by using the following command. (We cloned into the "Bench" directory.)

$ git clone https://github.com/exatoa/Bench4BL.git Bench
If you don't have git, please install git first using following commands. $ sudo apt-get update
$ sudo apt-get install git

Download subjects' archives.

Download all subjects from the Subjects table and save them in the cloned repository path. If you need some of all subjects, you can download some of them. We saved them into the 'Bench/_archives' directory. To use our scripts, we recommend that each subject stores in the group directory to which it belongs. After downloaded, unpack all archives by using the unpacking.sh script.

$ cd Bench
Bench$ mkdir _archives
Bench$ cd _archives
Bench/_archives$ mkdir Apache
Bench/_archives$ cd Apache
Bench/_archives/Apache$ wget -O CAMEL.tar "https://drive.google.com/uc?export=download&id=0B78iVP5pcTfKdEZZZnJrWmZxWjg"
....work recursively....
Bench$ mkdir data
Bench$ ./unpacking.sh _archives data

The last command unpacks all archive files in '_archives' folder into 'data' folder as keeping the original directory structures.

Install python

We used python 2.7. (If you have python 2.7 in your computer, please skip this section.)

$ sudo add-apt-repository ppa:fkrull/deadsnakes
$ sudo apt-get update
$ sudo apt-get install python2.7 python
$ sudo apt-get install python-pip

Install python libraries

We have 8 dependencies below:

bs4 >= 0.0.1
matplotlib >= 2.0.1
numpy >= 1.13.3
scipy >= 0.19.1
python-dateutil >= 2.6.1
pytz >= 2017.3
GitPython >= 2.1.5
XlsxWriter >= 0.9.8

You can install using following commnad.

$ pip install numpy scipy matplotlib pytz GitPython bs4 xlswriter python-dateutil

Update PATH information (Editing script code)

In the file scripts/commons/Subject.py, there are variables that stores a resource PATH information as a string and subject informations. To use our scripts, you should change the variables properly.

class Subjects(object):
    ...
    root = u'/mnt/exp/Bug/data/'
    root_result = u'/mnt/exp/Bug/expresults/'
    techniques = ['BugLocator', 'BRTracer', 'BLUiR', 'AmaLgam', 'BLIA', 'Locus']
    groups = ['Apache', 'Commons', 'JBoss', 'Wildfly', 'Spring']
    projects = {
        'Apache':[u'CAMEL', u'HBASE', u'HIVE'],
        'Commons':[u'CODEC', u'COLLECTIONS', u'COMPRESS', u'CONFIGURATION', u'CRYPTO', u'IO', u'LANG', u'MATH', u'WEAVER',u'CSV'],
        'JBoss':[u'ENTESB', u'JBMETA'],
        'Wildfly':[u'ELY', u'WFARQ', u'WFCORE', u'WFLY', u'WFMP',u'SWARM'],
        'Spring':[U'AMQP', U'ANDROID', U'BATCH', U'BATCHADM', U'DATACMNS', U'DATAGRAPH', U'DATAJPA', U'DATAMONGO', U'DATAREDIS', U'DATAREST', U'LDAP', U'MOBILE', U'ROO', U'SEC', U'SECOAUTH', U'SGF', U'SHDP', U'SHL', U'SOCIAL', U'SOCIALFB', U'SOCIALLI', U'SOCIALTW', U'SPR', U'SWF', U'SWS']
    }
    ...

root : The directory that you unpacked downloaded archives.
root_result : The directory that the previous techniques' result will be stored.
techniques : The previous technique names.
groups : The group names that you want to test.
projects : The subject names that you want to test. Each subject name should be classified into specific group name.

Version Information

We selected specific versions for each Subject and saved into versions.txt according to the Subject folder. The file is in JSON format and we used a dictionary to save information. A top-level key means a Subject name which is correspond written in Subjects.py. The selected versions are also listed using dictionary structure. The key text is version name which means you want to represent it and the value test is tag name written in git repository. For example, Let you want to store CODEC Subject's version information. You could write like below JSON text and save it in 'Bench/data/Commons/CODEC/versions.txt'. We offer the selected versions in the archieves. If you want to use versions that we selected, you don't need to change version information files.

{
    "CODEC":{
            "1.4":"CODEC_1_4",
            "1.5":"commons-codec-1.5",
            "1.6":"1_6",
            "1.7":"1.7",
            "1.1":"CODEC_1_1",
            "1.2":"CODEC_1_2",
            "1.3":"CODEC_1_3",
            "1.8":"1.8",
            "1.9":"1.9",
            "1.10":"1.10"
    }
}

Inflate the source codes.

We used multiple versions of source code for the experiment. Since the provided archives have only a git repository, you need to inflate also. The script launcher_GitInflator.py clones a git repositories and inflates it into the multiple versions which you selected.

Bench$ cd scripts
Bench/scripts$ python launcher_GitInflator.py

Build bug repositories

We need to build a repository for the bug reports with pre-crawled bug reports. The bug repository is in XML format and includes bug data which is used in the experiments. launcher_repoMaker.py makes the bug repository that containing entire crawled bug reports information and bug repositories that stores bug reports according to the mapped version. But, since we already offer the result of this step in provided subject's archives, use this script if you want to update the bug repositories. launcher_DupRepo.py creates a bug repository file that contains bug information merged duplicate bug reports.

Bench/scripts$ python launcher_repoMaker.py
Bench/scripts$ python launcher_DupRepo.py

Update count information of bug and source codes.

The script of Counting.py makes a count information for bug and source code. The result will be stored bugs.txt, sources.txt and answers.txt in each subject's folder.

Bench/scripts$ python Counting.py

Execute Previous Techniques

To get the result of each technique, you can use scripts/launcher_Tool.py.
Preparing step
- You need to set the PATHs and JavaOptions in the launcher_Tool.py file.
- Open the file, launcher_Tool.py and check the following variables
- ProgramPATH: Set the directory path which contains the release files of the IRBL techniques. (ex. u'~/Bench/techniques/releases/')
- OutputPATH: Set the result path to save output of each technique (ex. u'~/Bench/expresults/')
- JavaOptions: Set the java command options. (ex. '-Xms512m -Xmx4000m')
- JavaOptions_Locus: Set the java options for Locus. Because Locus need a large memory, we separated the option. (ex. '-Xms512m -Xmx4000m')
The script executes 6 techniques for all subjects.
The script basically works for the multiple versions of bug repository and each of the related source codes.
Options
- -w : [necessary] With this option, users can set the ID for each experiment, and each ID is also used as a directory name to store the execution results of each Technique. Additionally, if the name starts with "Old", this script works for the previous data, otherwise works for the new data.
- -g : A specific group. With this option, the script works for the subjects in the specified group.
- -p : A specific subject. To use this option, you should specify the group name.
- -t : A specific technique. With this option, the script makes results of specified technique.
- -v : A specific version. With this option, the script works for the specified version of source code.
- -s: Single version mode, With this option, the script works for the only latest source code.
- -m: With this option, the bug repositories created by combining the text of duplicate bug report pairs are used instead of the normal one.
Examples

Bench/scripts$ python launcher_Tool.py -w NewData
Bench/scripts$ python launcher_Tool.py -w NewDataSingle -s
Bench/scripts$ python launcher_Tool.py -w NewData_Locus -t Locus
Bench/scripts$ python launcher_Tool.py -w NewData_CAMLE -g Apache -p CAMEL

Install Java

All previous techniques are executed in Java Runtime Environment. If you have java in your computer, please skip this section.

$ sudo apt-get install python-software-properties
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer

Install indri

To execute BLUiR and AmaLgam, you need to install indri.
Since there are compile problems, we chose indri-5.6 version.
In the installing process, please memorize the path in the first line in the "make install" log.
(In my case, /usr/local/bin. This is the installed path of indri)
And then, Change Settings.txt file.
Commands to install indri

// Install g++ and make for indri
$ sudo add-apt-repository ppa:ubuntu-toolchain-r/test
$ sudo apt-get update
$ sudo apt-get install g++
$ sudo apt-get install make
$ sudo apt-get install --reinstall zlibc zlib1g zlib1g-dev

// download and install indri (If you faced an error in the compiling, please try with another version.)
$ wget https://downloads.sourceforge.net/project/lemur/lemur/indri-5.6/indri-5.6.tar.gz
$ tar -xzf indri-5.6.tar.gz
$ cd indri-5.6
$ ./configure
$ make
$ make install
/usr/bin/install -c -m 755 -d /usr/local/bin
/usr/bin/install -c -m 755 -d /usr/local/include
/usr/bin/install -c -m 755 -d /usr/local/include/indri
...
...
/usr/bin/install -c -m 644 Makefile.app /usr/local/share/indri

// changeSettings.txt file
$ cd ~/irblsensitivity/techniques/releases // We assume you cloned our repository to
$ vi Settings.txt
indripath=/usr/local/bin/ <-- edit this value as a the first log of "make install"

Previous Techniques Load on Eclipse

We changed previous techniques on Eclipse. But we didn't include eclipse environment files (.metadata folder, .project and .classpath file) in each previous techniques folders.

So, If you want to load these techniques on Eclipse, please follow next sequence.

Open Eclipse
Make a 'techniques' folder into workplace of Eclipse. Then .metadata folder will be created in 'techniques' folder.
On the 'Package Explorer' panel, Open context menu by clicking right mouse button.
Select 'Import', Then a pop-up windows will be placed.
Except BLUiR project, choose 'General > Projects from Folder or Archive' item and click 'Next' button.
Designate project folder in 'techniques' and click 'Finish' button.
Then, the project will be loaded and be shown in the Package Explorer.
BLUiR is made as Maven project. So, You should import with 'Maven > Existing Maven Project'. And then, just choose project folder. You don't need to change any other options.
Especially BLIA project, need to add library JUnit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

analysis

analysis

scripts

scripts

techniques

techniques

README.md

README.md

packing.sh

packing.sh

unpacking.sh

unpacking.sh

Repository files navigation

Bench4BL

Subjects ( Bug reports and Source Code Repositories )

Repository Directory Structure

Getting Started

Clone this repository

Download subjects' archives.

Install python

Install python libraries

Update PATH information (Editing script code)

Version Information

Inflate the source codes.

Build bug repositories

Update count information of bug and source codes.

Execute Previous Techniques

Install Java

Install indri

Previous Techniques Load on Eclipse

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
analysis		analysis
scripts		scripts
techniques		techniques
README.md		README.md
packing.sh		packing.sh
unpacking.sh		unpacking.sh

IRBLReproduction/FaultExpSuites

Folders and files

Latest commit

History

Repository files navigation

Bench4BL

Subjects ( Bug reports and Source Code Repositories )

Repository Directory Structure

Getting Started

Clone this repository

Download subjects' archives.

Install python

Install python libraries

Update PATH information (Editing script code)

Version Information

Inflate the source codes.

Build bug repositories

Update count information of bug and source codes.

Execute Previous Techniques

Install Java

Install indri

Previous Techniques Load on Eclipse

About

Resources

Stars

Watchers

Forks

Languages