EmbeddingBugs

Many longstanding and emergent problems in software engineering are at their root a question of measuring correspondence between source code and natural language (e.g. query response and feature localization). It has been proposed that word embedding approaches will allow us to bridge this lexical gap, as word embeddings allow a representation of relative semantic relatedness in a language- agnostic space. This paper shows an attempt at replication of a well-cited publication addressing the problem of bug localization using word embeddings. We use a novel training dataset as our source for developing word embeddings but test on a common, standardized dataset. We provide insights on the process behind experiment replication, offering advice to those wishing to increase the replicability of their publications. We demonstrate the influence of choices in preprocessing steps, further highlighting the need for extensive experiment reporting as the field of software engineering continues to integrate machine learning tools.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

EmbeddingBugs

Files

README.md

Latest commit

History

README.md

File metadata and controls

EmbeddingBugs