Skip to content

A Python data science project in analyzing and visualizing dependencies between hierarchies of code in a Java program

Notifications You must be signed in to change notification settings

huji-proj-needle2021/code-hierch-analysis

Repository files navigation

Introduction

A data science project in which we explore the connection between various code hierarchies(methods, classes and packages) in a Java program, where we define those connections via two ways:

  • Function dependencies inferred from the function call graph.

  • Dependencies inferred from code hierarchies that are "changed together" - in the same git commit, or the same pull request.

The former part was implemented via the soot library, using a small Java wrapper program located at the following repository

The latter was implemented by extracting commits and pull requests via GitHub's API as well as pygit2 and matching them to the code hierarchies in which they reside. This was done via hand-rolling a partial Java parser for hierarchy declarations.

These dependencies can be modelled via a graph and analyzed in various ways; check out the write up for more information. An interactive visualization of these graphs was implemented via the Dash library as a web-app.

This project was done as part of the course "A Needle in a Data Haystack" taken at the Hebrew University of Jerusalem.

Running

You can build and run the included Dockerfile in order to run the visualization tool as follows:

docker build -t gitanalysis .
docker run -it -v ./GRAPHS:/app/GRAPHS -p 8050:8050 gitanalysis 

And then visit http://localhost:8050

The app also includes a section for creating a graph using the call-graph method (by invoking the included Java graph generator) for a .jar executable program( includes a Main class)

Generating graphs from commits/PRs can only be done manually by modifying and running [assocRules.ipynb] with various parameters that need be manually tuned(see writeup).

You can download some pre-made graphs from here and insert them into GRAPHS. (Generated from jadx, a Java/Android decompiler)

About

A Python data science project in analyzing and visualizing dependencies between hierarchies of code in a Java program

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published