Skip to content

This repository holds our code, data, and replication package for our NLP4Prog'21 paper entitled "Code to Comment Translation: An Empirical Study on Model Effectiveness and Errors"

SageSELab/CodeSumStudy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CodeSummarizationEmpiricalStudy

Project structure

CodeBERT

The source code for this model is forked from CodeXGLUE-Repository. Some small changes are applied. However, we follow the same instructions for fine-tuning and inference.

  • CodeBERT/preprocessing: Contains the preprocessing techniques applied in the Funcom dataset. Some of the techniques are chosen from previous work.

NeuralCodeSum:

The source code for this model is forked from NeuralCodeSum-Repository. We follow the same steps for training/testing the model.

  • NeuralCodeSum/preprocessing: In addition to preprocessing techniques applied in the CodeBERT, we added some other preprocessing techniques, e.g., split tokens for this model.

Code2seq

We used the open-source implementation from code2seq-repo

  • code2seq/JavaExtractor: modified dataset-build and AST generation files. Original repo: LRNavin/AutoComments
  • code2seq/preproc: dataset preprocess folder (part of AST generation) with slight modification in code2seq/preproc/feature_extractor.py

Experiments

  • code2seq/code2seq_commands.ipynb: Notebook containing Funcom data preprocessing steps for code2seq, study result analysis and statistical significance test for BLEU score

Qualitative Study Results

This folder contains all the categories selected by each of the annotators. Also, the final categories after resolving conflicts are also mentioned there for each of the samples.

About

This repository holds our code, data, and replication package for our NLP4Prog'21 paper entitled "Code to Comment Translation: An Empirical Study on Model Effectiveness and Errors"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published