Skip to content

graykode/commit-autosuggestions

Repository files navigation

commit-autosuggestions

Build Status License: Apache 2.0 PyPI Downloads

This is implementation of CommitBERT: Commit Message Generation Using Pre-Trained Programming Language Model. CommitBERT is accepted in ACL workshop : NLP4Prog. Have you ever hesitated to write a commit message? Now get a commit message from Artificial Intelligence!

Abstract

CodeBERT: A Pre-Trained Model for Programming and Natural Languages introduces a pre-trained model in a combination of Program Language and Natural Language(PL-NL). It also introduces the problem of converting code into natural language (Code Documentation Generation).

diff --git a/test.py b/test.py
new file mode 100644
index 0000000..d13f441
--- /dev/null
+++ b/test.py
@@ -0,0 +1,6 @@
+
+import torch
+import argparse
+
+def add(a, b):
+    return a + b
Recommended Commit Message : Add two arguments .

We can use CodeBERT to create a model that generates a commit message when code is added. However, most code changes are not made only by add of the code, and some parts of the code are deleted.

diff --git a/test.py b/test.py
index d13f441..1b1b82a 100644
--- a/test.py
+++ b/test.py
@@ -1,6 +1,3 @@

-import torch
-import argparse
-
 def add(a, b):
     return a + b
Recommended Commit Message : Remove unused imports

To solve this problem, use a new embedding called patch_type_embeddings that can distinguish added and deleted, just as the XLM(Lample et al, 2019) used language embeddeding. (1 for added, 2 for deleted.)

Language support

Language Added Diff Data(Only Diff) Weights
Python 423k Link
JavaScript 514k Link
Go
JAVA
Ruby
PHP
  • ✅ — Supported
  • ⬜ - N/A ️

We plan to slowly conquer languages that are not currently supported. However, I also need to use expensive GPU instances of AWS or GCP to train about the above languages. Please do a simple sponsor for this! Add data is CodeSearchNet dataset.

Quick Start

To run this project, you need a flask-based inference server (GPU) and a client (commit module). If you don't have a GPU, don't worry, you can use it through Google Colab.

1. Run flask pytorch server.

Prepare Docker and Nvidia-docker before running the server.

1-a. If you have GPU machine.

Serve flask server with Nvidia Docker. Check the docker tag for programming language in here.

Language Tag
Python py
JavaScript js
Go go
JAVA java
Ruby ruby
PHP php
$ docker run -it -d --gpus 0 -p 5000:5000 graykode/commit-autosuggestions:{language}
1-b. If you don't have GPU machine.

Even if you don't have a GPU, you can still serve the flask server by using the ngrok setting in commit_autosuggestions.ipynb.

2. Start commit autosuggestion with Python client module named commit.

First, install the package through pip.

$ pip install commit

Set the endpoint for the flask server configured in step 1 through the commit configure command. (For example, if the endpoint is http://127.0.0.1:5000, set it as follows: commit configure --endpoint http://127.0.0.1:5000)

$ commit configure --help       
Usage: commit configure [OPTIONS]

Options:
  --profile TEXT   unique name for managing each independent settings
  --endpoint TEXT  endpoint address accessible to the server (example :
                   http://127.0.0.1:5000/)  [required]

  --help           Show this message and exit.

All setup is done! Now, you can get a commit message from the AI with the command commit.

$ commit --help          
Usage: commit [OPTIONS] COMMAND [ARGS]...

Options:
  --profile TEXT       unique name for managing each independent settings
  -f, --file FILENAME  patch file containing git diff (e.g. file created by
                       `git add` and `git diff --cached > test.diff`)

  -v, --verbose        print suggested commit message more detail.
  -a, --autocommit     automatically commit without asking if you want to
                       commit

  --help               Show this message and exit.

Commands:
  configure

Training detail

Refer How to train for your lint style. This allows you to re-fine tuning to your repository's commit lint style.

Contribution

You can contribute anything, even a typo or code in the article. Don't hesitate!!. Versions are managed only within the branch with the name of each version. After being released on Pypi, it is merged into the master branch and new development proceeds in the upgraded version branch.

Author

Tae Hwan Jung(@graykode)

Citation

@article{jung2021commitbert,
  title={CommitBERT: Commit Message Generation Using Pre-Trained Programming Language Model},
  author={Jung, Tae-Hwan},
  journal={arXiv preprint arXiv:2105.14242},
  year={2021}
}