BaT CoRe: Baselines and testing framework for code reviewer recommendations

Documentation

Extensive documentation for Bat CoRe can be found here. Bat CoRe can be installed via pip:

pip install batcore

This repository provides a framework for testing code review recommendation algorithms.

RecommenderBase is a simple interface for a generic recommender algorithm. It has a fit method that trains model on a given data and a predict method that predicts reviewers for the given pulls.
BanRecommenderBase is an in interface derived from RecommenderBase that implements by candidate filtering by their activity and ownership of the pull request

DatasetBase is an interface that encapsulates any preprocessing needed for the dataset with a method preprocess.
GerritLoader is a class that deals with the data loader from the PullExtractor tool: loads it, reformats it, and performs some high-level preprocessing (empty data removal, bot removal, alias matching, user encoding)
StandardDataset an implementation of DatasetBase. Receives GerritLoader object and performs all the preprocessing that can be needed for a specific model
SpecialDatasets has implementations of datasets that are unique to a specific recommender
get_gerrit_dataset helping function for dataset creation. Dataset for a specific model can be created as get_gerrit_dataset(gerritloader_object, model_cls=RecommenderBase_class)
StreamLoaderBase is an interface that encapsulates iteration over data. The interface views data as a temporal stream of events and yields pairs of consecutive segments of the stream - train (any set of events), test (a pull request event)

TesterBase is class that has test_recommender method. This method takes RecommenderBase and StreamLoaderBase and iterates over the dataset, retrains model on new train data, and calculates its predictions.
RecTester an implementation of TesterBase that calculates metrics standard for recommendation systems (mrr, accuracy, etc)
SimulTester an implementation of TesterBase that tester for project-based metrics on a simulated history (Mirsaeedi, Rigby, 2020). Metrics for the simulated testing can be found in Counter.

An example of usage can be found in example.py

Framework contains implementations of the following models (baselines\models)

RevFinder, Who Should Review My Code?, Thongtanunam et al., 2015
Tie, Who Should Review This Change?, Xin Xia et al., 2015
ACRec, Who Should Comment on This Pull Request?, Jiang et al. 2017
cHRec, Automatically Recommending Peer Reviewers in Modern Code Review, Zanjani et al., 2015
CN, What Can We Learn from Code Review and Bug Assignment?, Yu et al., 2015
RevRec, Search-Based Peer Reviewers Recommendation in Modern Code Review, Ouni et al., 2016
WRC, Automatically Recommending Code Reviewers Based on Their Expertise: An Empirical Comparison, Hannebauer et al., 2016
xFinder, Assigning change requests to software developers, Kagdi et al., 2011

Name		Name	Last commit message	Last commit date
Latest commit History 272 Commits
batcore		batcore
docs		docs
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md
example.py		example.py
experiments.py		experiments.py
params.py		params.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
utils.py		utils.py