README

This artifact contains the data and code used in the paper Analysis and Detection of Information Types ofOpen Source Software Issue Discussions.

It comprises of three main folders: data, experiments and results. The components of which are as follows:

data: This folder contains all the data utilized in the experiments performed.

chosen_issues - contains the comments of the chosen 15 OSS project issue discussion, retrieved from the Github API, in json format
Codebook.xlsx - the codebook to classify a sentence into a particular information type
Corpus.xlsx - the list annotated sentences and their corresponding conversational feature set.
annotated_data_with_metadata.xlsx - this is the file exported from Atlas.ti annotation tool. It is the list of annotated sentences, along with meta information provided by the tool. Additionally, it contains phrases of comment created at and author annotated as METADATA for the purpose of extracting conversational features.
all_data.pkl - is a pickle file containing a pandas dataframe of similar information to Corpus.xlsx. It contains the annotated sentences along with their conversational feature set. It also contains the document in which the sentence exists.
data_by_document.pkl - this contains the same information as all_data.pkl. The only difference is that it is a dictionary with keys as documents and values as the pandas dataframe of sentence information.

experiments: This folder contains all the code used to perform the experiments presented in the paper.

preprocess.ipynb - is the first step. It reads the annotations from the xlsx file exported from Atlas.ti and extracts all conversational feature information of each sentence and stores it in all_data.pkl and data_by_document.pkl.
transform_features.ipynb - performs transformation on the conversational features such as converting categorical columns to one-hot-encoding and converting datetime features to numerically comparable values
logistic_regression/random_forest - these folders contain each of the experiments presented in the corresponding paper.

All the files in this folder are Jupyter notebooks with the extension .ipynb. Each file comprises of multiple cells containing code. These cells encapsulate different steps in the flow of code to enable observation of intermediate results. More about Jupyter can be found on this Jupyter page.

results: Is the folder containing the results of the experiments performed.

Two additional folders exist:

docker: This folder contains the compressed docker image with the required environment as well as the Dockerfile used to build this image
nltk_data: This contains the wordnet corpus required by the code to lemmatize words

Instructions for Reproducibility:

To set up the environment for this work, refer to the file INSTALL.md.

Run all cells, in order, in preprocess_data.ipynb
Run all cells, in order, in transform_features.ipynb
Enter into either of the algorithm folders (logistic_regression or random_forest), and run all cells (in order) of the experiment you wish to reproduce.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
docker		docker
experiments		experiments
nltk_data/corpora		nltk_data/corpora
results		results
.gitignore		.gitignore
AUTHORS.md		AUTHORS.md
INSTALL.md		INSTALL.md
LICENSE.md		LICENSE.md
README.md		README.md
STATUS.md		STATUS.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

docker

docker

experiments

experiments

nltk_data/corpora

nltk_data/corpora

results

results

.gitignore

.gitignore

AUTHORS.md

AUTHORS.md

INSTALL.md

INSTALL.md

LICENSE.md

LICENSE.md

README.md

README.md

STATUS.md

STATUS.md

Repository files navigation

README

Instructions for Reproducibility:

About

Releases 1

Packages

Languages

License

deekshaarya4/Info_Types_in_OSS_Issue_Discussions

Folders and files

Latest commit

History

Repository files navigation

README

Instructions for Reproducibility:

About

Resources

License

Stars

Watchers

Forks

Languages