Skip to content

Egor18/jdataflow

Repository files navigation

jdataflow

Build Status codecov

jdataflow is a Z3 solver based data-flow analyzer for Java source code.

Capabilities

At the moment jdataflow has checkers which allow to detect logical errors (always true/false expressions), null pointer dereferences, array indices out of bounds, etc.

For example:

void f(boolean c) {
    int x = 5;
    int y = c ? 3 : 2;
    if (x < y) {} // <= always false
}

void g(int x, int y) {
    if (2 * x + 3 * y == 12) {
        if (5 * x - 2 * y == 11) {
            if (x == 3) {} // <= always true
            if (y == 2) {} // <= always true
        }
    }
}

void h(Something x) {
    if (x == null) {
        x.f(); // <= null dereference
    }
}

void z() {
    int[] arr = new int[] {1, 2, 4, 8, 16, 32};
    int x = arr[9]; // <= array index out of bounds
}

Check out test directory for more examples.

In general, jdataflow is capable to evaluate expressions statically, perform symbolic execution, handle control flow of a program, and so on. It also features a proper memory model, so it nicely deals with reference aliasing.

Build and run

In order to build jdataflow you need JDK 8 or newer. Also, you have to download and install Z3.

Windows:

  1. Download z3-4.8.4 here: https://github.com/Z3Prover/z3/releases
  2. Add bin directory to your PATH environment variable
  3. Run gradlew build

Note: Visual C++ 2015 Redistributable may be required.

Linux:

  1. Download z3-4.8.4 here https://github.com/Z3Prover/z3/releases
  2. Add bin directory to your LD_LIBRARY_PATH: export LD_LIBRARY_PATH=/path/to/z3/bin
  3. Run ./gradlew build

macOS:

  1. Download z3-4.8.4 here https://github.com/Z3Prover/z3/releases
  2. Add bin directory to your DYLD_LIBRARY_PATH: export DYLD_LIBRARY_PATH=/path/to/z3/bin
  3. Run ./gradlew build

Note that you may still need to set up environment variables or java.lang.path in your IDE.

Now you can go to the build directory and run the resulting jar:
java -jar jdataflow.jar --sources <args...> [--classpath <arg>] [--classpath-file <arg>]

If you want to check your Ant, Maven, or Gradle based project with jdataflow, you should use scan utils.

Under the hood

AST

jdataflow uses Spoon library to build and traverse AST in a post-order manner. While traversing the AST, it translates the source code into a form understandable by Z3 solver. Then the checks are performed by the solver.

Static single assignment form (SSA)

First of all, in order to be used with Z3, the code should be present in Static Single Assignment form. We do this to maintain the order of a program in a logic formula. Here is a simple example:

Original form:

x = 1;
y = 2;
x = x + y;

SSA form:

x0 = 1;
y0 = 2;
x1 = x0 + y0;

Conditions

After traversing if statement, we create a new variable, which is the result of a special if-then-else function provided by Z3. This function essentially joins values from branches over a condition.

Original form:

if (c) {
    x = 1;
} else {
    x = 2;
}

SSA form:

if (c) {
    x0 = 1;
} else {
    x1 = 2;
}
x3 = ITE(c0, x0, x1);

Functions and interprocedural analysis

The most obvious approach to deal with function calls is to inline them. Unfortunately, we can do it only for small functions (like getters or setters), because of the performance considerations.
Another approach is to build Function Summaries (automatically or manually), which contain information about contracts, purity, return values and so on.
But in general, any unknown function resets the values of its arguments.

Loops

The most obvious approach to deal with loops is to unroll them. Unfortunately, we can do it only when the number of iterations is known statically and relatively small.
So in general, we have to find loop invariants and reset everything else, like for the unknown functions.
However, some special cases could be treated differently.

Memory model

jdataflow uses type-indexed memory model (Burstall’s memory model). In this memory model each data type or field has its own memory array. It allows to detect something like that:

void m() {
    C a = new C();
    a.x = 10;
    C b = a;
    b.x = 20;
    if (a.x == 20) {} // <= always true
}

Here are some useful links with a more detailed explanation of different memory models:

To Do list

At the moment this project is just a proof of concept, so there is a lot of work to do:

  • Better Interprocedural analysis and Function Summaries;
  • Better loop analysis;
  • Parallel analysis;
  • Analyze strings (z3str has performance limitations though);
  • Detecting potential null dereferences and potential OOB;

About

jdataflow is a Z3 solver based data-flow analyzer for Java source code.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published