As extracted from: https://ml4code.github.io/papers.html
Here are just papers written in 2019:
A Grammar-Based Structural CNN Decoder for Code Generation Z. Sun, Q. Zhu, L. Mou, Y. Xiong, G. Li, L. Zhang
Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization S. H. H. Ding, B. C. M. Fung, P. Charland
code2seq: Generating Sequences from Structured Representations of Code U. Alon, O. Levy, E. Yahav
code2vec: Learning Distributed Representations of Code U. Alon, O. Levy, E. Yahav
Generative Code Modeling with Graphs M. Brockscmidt, M. Allamanis A. L. Gaunt, O. Polozov
Learning to Represent Edits P. Yin, G. Neubig, M. Allamanis, M. Brockschmidt, A. L. Gaunt
Method name suggestion with hierarchical attention networks S. Xu, S. Zhang, W. Wang, X. Cao, C. Guo, J. Xu
Neural Networks for Modeling Source Code Edits R. Zhao, D. Bieber, K. Swersky, D. Tarlow
Neural Program Repair by Jointly Learning to Localize and Repair M. Vasic, A. Kanade, P. Maniatis, D. Bieber, R. Singh
NEUZZ: Efficient Fuzzing with Neural Program Smoothing D. She, K. Pei, D. Epstein, J. Yang, B. Ray, S. Jana
On Learning Meaningful Code Changes via Neural Machine Translation M. Tufano, C. Watson, G. Bavota, M. Di Penta, M. White, D. Poshyvanyk
SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair Z. Chen, S. Kommrusch, M. Tufano, L. Pouchet, D. Poshyvanyk, M. Monperrus
Structured Neural Summarization P. Fernandes, M. Allamanis, M. Brockschmidt The Adverse Effects of Code Duplication in Machine Learning Models of Code M. Allamanis
And here are the papers written in 2018:
A Deep Learning Approach to Identifying Source Code in Images and Video J. Ott, A. Atchison, P. Harnack, A. Bergh, E. Linstead
A General Path-Based Representation for Predicting Program Properties U. Alon, M. Zilberstein, O. Levy, E. Yahav
A Retrieve-and-Edit Framework for Predicting Structured Outputs T. B. Hashimoto, K. Guu, Y. Oren, P. Liang
An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation M. Tufano, C. Watson, G. Bavota, M. Di Penta, M. White, D. Poshyvanyk
Automated Vulnerability Detection in Source Code Using Deep Representation Learning R. L. Russell, L. Kim, L. H. Hamilton, T. Lazovich, J. A. Harer, O. Ozdemir, P. M. Ellingwood, M. W. McConley
Bilateral Dependency Neural Networks for Cross-Language Algorithm Classification N. D. Q. Bui, Y. Yu, L. Jiang Building Language Models for Text with Named Entities M.R. Parvez, S. Chakraborty, B. Ray, KW Chang
Compiler Fuzzing through Deep Learning C. Cummins, P. Petoumenos, H. Leather, A. Murray
Content Aware Source Code Change Description Generation P. Loyola, E. Marrese-Taylor, J.A. Balazs, Y. Matsuo, F. Satoh
Cross-Language Learning for Program Classification using Bilateral Tree-Based Convolutional Neural Networks N. Bui, L. Jiang, Y. Yu
Deep Code Search X. Gu, H. Zhang, S. Kim Deep Learning Similarities from Different Representations of Source Code M. Tufano, C. Watson, G. Bavota, M. Di Penta, M. White, D. Poshyvanyk
Deep Learning to Detect Redundant Method Comments A. Louis, S. K. Dash, E. T. Barr, C. Sutton
Deep Learning Type Inference V. J. Hellendoorn, C. Bird, E. T. Barr, M. Allamanis
Deep Reinforcement Learning for Programming Language Correction R. Gupta, A. Kanade, S. Shevade
Evaluation of Type Inference with Textual Cues A. Shirani, A. P. Lopez-Monroy, F. Gonzalez, T. Solorio, M.A. Alipour
Exploring the Naturalness of Buggy Code with Recurrent Neural Network J. Lanchantin, J. Gao
Generating Regular Expressions from Natural Language Specifications: Are We There Yet? Z. Zhong, J. Guo, W. Yang, T. Xie, JG Lou, Y. Liu, D. Zhang
Improving Automatic Source Code Summarization via Deep Reinforcement Learning Y. Wan, Z. Zhao, M. Yang, G. Xu, H. Ying, J. Wu, P.S. Yu
Intelligent code reviews using deep learning A. Gupta, N. Sundaresan
Learning How to Mutate Source Code from Bug-Fixes M. Tufano, C. Watson, G. Bavota, M. Di Penta, M. White, D. Poshyvanyk
Learning Loop Invariants for Program Verification X. Si, H. Dai, M. Raghothaman, M. Naik, L. Song
Learning to Generate Corrective Patches using Neural Machine Translation H. Hata, E. Shihab, G. Neubig
Learning to Mine Aligned Code and Natural Language Pairs from Stack Overflow P. Yin, B. Deng, E. Chen, B. Vasilescu, G. Neubig
Learning to Repair Software Vulnerabilities with Generative Adversarial Networks J. A. Harer, O. Ozdemir, T. Lazovich, C. P. Reale, R. L. Russell, L. Y. Kim
Learning to Represent Programs with Graphs M. Allamanis, M. Brockscmidt, M. Khademi
Mapping Language to Code in Programmatic Context S. Iyer, I. Konstas, A. Cheung, L. Zettlemoyer
Neural Code Comprehension: A Learnable Representation of Code Semantics T. Ben-Nun A. S. Jakobovits, T. Hoefler
Neural-Augumented Static Analysis of Android Communication J. Zhao, A. Albarghouthi, V. Rastogi, S. Jha, D. Octeau
Neural-Machine-Translation-Based Commit Message Generation: How Far Are We? Z. Liu, X. Xia, A.E. Hassan, D. Lo, Z. Xing, X. Wang
Neuro-symbolic program corrector for introductory programming assignments S. Bhatia, P. Kohli, R. Singh
NL2Bash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System. X.V. Lin, C. Wang, L. Zettlemoyer and M.D. Ernst
Open Vocabulary Learning on Source Code with a Graph-Structured Cache M. Cvitkovic, B. Singh, A. Anandkumar
Path-Based Function Embedding and its Application to Specification Mining D. DeFreez, A. V. Thakur, C. Rubio-González
Polyglot Semantic Parsing in APIs Kyle Richardson, Jonathan Berant, Jonas Kuhn
Public Git Archive: a Big Code dataset for all V. Markovtsev, W. Long
RefiNym: Using Names to Refine Types S. Dash, M. Allamanis, E. T. Barr
StaQC: A Systematically Mined Question-Code Dataset from Stack Overflow Ziyu Yao, Daniel S. Weld, Wei-Peng Chen, Huan Sun
Syntax and Sensibility: Using language models to detect and correct syntax errors E. A. Santos, J. C. Campbell, D. Patel, A. Hindle, J. N. Amaral
Tree2Tree Neural Translation Model for Learning Source Code Changes S. Chakraborty, M. Allamanis, B. Ray
And here are the 2017 papers:
A Language Model for Statements of Software Code Y. Yang, Y. Jiang, M. Gu, J. Sun, J. Gao, H. Liu
A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes P. Loyola, E. Marrese-Taylor, Y. Matsuo
A parallel corpus of Python functions and documentation strings for automated code documentation and code generation A.V.M. Barone, R. Sennrich
A Syntactic Neural Model for General-Purpose Code Generation P. Yin, G. Neubig
Abridging Source Code B. Yuan, V. Murali, C. Jermain
Abstract Syntax Networks for Code Generation and Semantic Parsing M. Rabinovich, M. Stern, D. Klein
Are Deep Neural Networks the Best Choice for Modeling Source Code? V. J. Hellendoorn, P. Devanbu
Autofolding for Source Code Summarization J. Fowkes, R. Ranca, M. Allamanis, M. Lapata, C. Sutton
Automatically Generating Commit Messages from Diffs using Neural Machine Translation S. Jiang, A. Armaly, C. McMillan
Bayesian Sketch Learning for Program Synthesis V. Murali, S. Chaudhuri, C. Jermaine
Code Completion with Neural Attention and Pointer Networks J. Li, Y. Wang, I. King, M. R. Lyu
CodeSum: Translate Program Language to Natural Language X. Hu, Y. Wei, G. Li, Z. Jin
Context2Name: A Deep Learning-Based Approach to Infer Natural Variable Names from Usage Contexts R. Bavishi, M. Pradel, K. Sen
Deep Learning to Find Bugs M. Pradel, K. Sen
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning X. Gu, H. Zhang, D. Zhang, S. Kim
DeepFix: Fixing Common C Language Errors by Deep Learning R. Gupta, S. Pal, A. Kanade, S. Shevade
End-to-end Deep Learning of Optimization Heuristics C. Cummins, P. Petoumenos, Z. Wang, H. Leather
Exploring API Embedding for API Usages and Applications T.D. Nguyen, A.T. Nguyen, H.D. Phan, T.N. Nguyen
Finding Likely Errors with Bayesian Specifications V. Murali, S. Chaudhuri, C. Jermaine
Function Assistant: A Tool for NL Querying of APIs Kyle Richardson, Jonas Kuhn
Learning a Classifier for False Positive Error Reports Emitted by Static Code Analysis Tools U. Koc, P. Saadatpanah, J. S. Foster, A. A. Porter
Learning Technical Correspondences in Technical Documentation Kyle Richardson, Jonas Kuhn
Learning to Align the Source Code to the Compiled Object Code D. Levy, L. Wolf
Mining Semantic Loop Idioms from Big Code M. Allamanis, E. T. Barr, C. Bird, M. Marron, C. Sutton
Neural Attribute Machines for Program Generation M. Amodio, S. Chaudhuri, T. Reps
pix2code: Generating Code from a Graphical User Interface Screenshot T. Beltramelli
Program Synthesis from Natural Language Using Recurrent Neural Networks X.V. Lin, C. Wang, D. Pang, K. Vu, L. Zettlemoyer, M.D. Ernst
Recovering Clear, Natural Identifiers from Obfuscated JS Names B. Vasilescu, C. Casalnuovo, P. Devanbu
Semantically enhanced software traceability using deep learning techniques J. Guo, J. Cheng, J. Cleland-Huang
SmartPaste: Learning to Adapt Source Code M. Allamanis, M. Brockscmidt
Software Defect Prediction via Convolutional Neural Network J. Li, P. He, J. Zhu, and M. R. Lyu
Sorting and Transforming Program Repair Ingredients via Deep Learning Code Similarities M. White, M. Tufano, M. Martínez, M. Monperrus, D. Poshyvanyk
Synthesizing benchmarks for predictive modeling C. Cummin, P. Petoumenos, Z. Wang, H. Leather
The Code2Text Challenge: Text Generation in Source Code Libraries Kyle Richardson, Sina Zarrieß, Jonas Kuhn
Topic modeling of public repositories at scale using names in source code V. Markovtsev, E. Kant
And here are the 2016 papers:
A Convolutional Attention Network for Extreme Summarization of Source Code M. Allamanis, H. Peng, C. Sutton
A deep language model for software code H. K. Dam, T. Tran, T. Pham
Automated Correction for Syntax Errors in Programming Assignments using Recurrent Neural Networks S. Bhatia, R. Singh
Automatically generating features for learning program analysis heuristics K. Chae, H. Oh, K. Heo, H. Yang
Automatically Learning Semantic Features for Defect Prediction S. Wang, T. Liu, L. Tan
Bugram: bug detection with n-gram language models S. Wang, D. Chollak, D. Movshovitz-Attias, L. Tan
Convolutional Neural Networks over Tree Structures for Programming Language Processing L. Mou, G. Li, L. Zhang, T. Wang, Z. Jin
Deep API Learning X. Gu, H. Zhang, D. Zhang, S. Kim
Deep Learning Code Fragments for Code Clone Detection M. White, M. Tufano, C. Vendome, D. Poshyvanyk
Extracting Code from Programming Tutorial Videos S. Yadid, E. Yahav
Gated Graph Sequence Neural Networks Y. Li, R. Zemel, M. Brockschmidt, D. Tarlow
Latent Predictor Networks for Code Generation W. Ling, E. Grefenstette, K. M. Hermann, T. Kocisky, A. Senior, F. Wang, P. Blunsom
Learning API usages from bytecode: a statistical approach. H.V. Pham, T.T. Nguyen, P.M. Vu, T.T. Nguyen
Learning Programs from Noisy Data V. Raychev, P. Bielik, M. Vechev, A. Krause
Learning Python Code Suggestion with a Sparse Pointer Network A. Bhoopchand, T. Rocktäschel, E.T. Barr, S. Riedel
Learning to Fuzz: Application-Independent Fuzz Testing with Probabilistic, Generative Models of Input Data J. Patra, M. Pradel
Mapping API Elements for Code Migration with Vector Representations T.D. Nguyen, A.T. Nguyen, T.N. Nguyen
Neural Code Completion C. Liu, X. Wang, R. Shin, J.E. Gonzalez, D. Song
Parameter-Free Probabilistic API Mining across GitHub J. Fowkes, C. Sutton
PHOG: Probabilistic Model for Code P. Bielik, V. Raychev, M. Vechev
Question Independent Grading using Machine Learning: The Case of Computer Program Grading G. Singh, S. Srikant, V. Aggarwal
sk_p: a neural program corrector for MOOCs Y. Pu, K. Narasimhan, A. Solar-Lezama, R. Barzilay
Statistical Deobfuscation of Android Applications B. Bichsel, V. Raychev, P. Tsankov, M. Vechev
Summarizing Source Code using a Neural Attention Model S. Iyer, I. Konstas, A. Cheung, L. Zettlemoyer
Towards Better Program Obfuscation: Optimization via Language Models H. Liu