ComPy-Learn

ComPy-Learn is a framework for defining and exploring program representations for machine learning on source code (ML4CODE) tasks. While the special focus is on compiler optimization tasks, ComPy-Learn can also be used in other domains like software engineering, or systems security.

Project goals

Exploration of best-performing code representation and model: Depending on the task, different representations and models have shown to be differently suitable. Finding the best-performing one is not obvious and currently requires empirical evaluation. ComPy-Learn provides a common framework for that - evaluating different representations on a given task to find the best-performing one.
Design and discovery of new representations: Custom, task-specific representations of code can improve a models performance. However, extracting representations of program code is a tedious endeavor and requires low-level development with compiler tools. We aim to take away this burden by enabling to define program representations with a simple, high-level programming interface. This allows easier design and faster iterations.
Common tools, evaluation pipeline and datasets: Several promising representations and models to learn embeddings from those representations have been proposed in recent time. However, they use unique tools and pipelines for evaluations, making further comparisons to those methods time-consuming and difficult. ComPy-Learn provides a common framework for representations, models, and datasets and allows for evaluation of their combinations. Implementing a novel representation and model in this framework enables researches to do an effort-less and complete evaluation on the one hand, on the other hand contributes another widely applicable method to the community.

Design

ComPy-Learn's main components are shown in the pipeline below:

compy.representation allows the user to define custom representations (such as the ones from published work) of source code based on available semantic compiler-internal information, currently from the Clang/LLVM framework. Both, linear and graph representations of code are supported.
compy.model contains ML-models (in fact, it provides connectors to well-established model libraries) that embed the representations into vectors and finally output a prediction.
compy.dataset contains datasets of source code for evaluation, along with helper functions that allow integration of new datasets.

Supported representations

Currently, the following representations and models from published work are implemented in this framework:

Cummins, Chris, et al. "End-to-end deep learning of optimization heuristics." 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, 2017.
Barchi, Francesco, et al. "Code Mapping in Heterogeneous Platforms Using Deep Learning and LLVM-IR." 2019 56th ACM/IEEE Design Automation Conference (DAC). IEEE, 2019.
Brauckmann, Alexander, et al. "Compiler-based graph representations for deep learning models of code." Proceedings of the 29th International Conference on Compiler Construction. ACM, 2020.
Cummins, Chris, et al. "ProGraML: Graph-based Deep Learning for Program Optimization and Analysis." arXiv preprint arXiv:2003.10536 (2020).

Installation

We supply an installation script that automates the build, test, and installation process. The script currently supports the platforms listed below. Because the process builds ComPy-Learn from its sources, other platforms can be used with a bit of manual installation effort.

Platform	Build status
Ubuntu 16.04
Ubuntu 18.04
Ubuntu 20.04

To get started on one of the supported platforms, we suggest to first create a virtual environment, then run:

./install_deps.sh ${CUDA}

whereas ${CUDA} needs to be cpu, cu92, cu100 or cu102, depending on your machine's capabilities.

After successful installation, ComPy-Learn should be compiled and tested. To do so, please run:

python setup.py test

Finally, install ComPy-Learn in order to use it in your project:

python setup.py install

An example exploration is located in examples/devmap_exploration.py.

Publications

Brauckmann, Alexander, et al. "ComPy-Learn: A Toolbox for Exploring Machine Learning Representations for Compilers." 2020 Forum for Specification and Design Languages (FDL). IEEE, 2020.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
compy		compy
docs/img		docs/img
examples		examples
tests		tests
.clang-format		.clang-format
.gitignore		.gitignore
.travis.yml		.travis.yml
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
install_deps.sh		install_deps.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compy

compy

docs/img

docs/img

examples

examples

tests

tests

.clang-format

.clang-format

.gitignore

.gitignore

.travis.yml

.travis.yml

CMakeLists.txt

CMakeLists.txt

LICENSE

LICENSE

README.md

README.md

install_deps.sh

install_deps.sh

setup.py

setup.py

Repository files navigation

ComPy-Learn

Project goals

Design

Supported representations

Installation

Publications

About

Releases

Packages

Contributors 2

Languages

License

tud-ccc/compy-learn

Folders and files

Latest commit

History

Repository files navigation

ComPy-Learn

Project goals

Design

Supported representations

Installation

Publications

About

Topics

Resources

License

Stars

Watchers

Forks

Languages