Skip to content

UnitTestBot/usvm

Repository files navigation

What is USVM?

USVM is a powerful symbolic execution core designed for analysis of programs in multiple programming languages. Symbolic execution is known to be a very precise but resource demanding technique. USVM makes it faster, more stable, and versatile.

USVM main features include

  • extensible and highly optimized symbolic memory model
  • optimized constraint management to reduce SMT solver workload
  • improved symbolic models for containers (Map, Set, List, etc.)
  • targeted symbolic execution
  • solving type constraints with no SMT solver involved
  • bit-precise reasoning
  • forward and backward symbolic exploration

How can I use it?

With USVM, you can achieve completely automated

Right now, we have ready-to-be-used implementation for Java and experimental implementation for Python.

Taint analysis with USVM

USVM supports interprocedural condition-sensitive taint analysis of JVM bytecode. For instance, it is able to automatically find the following SQL injection: True positive sample

By default, USVM is able to find other problems:

  • null reference exceptions,
  • out-of-bounds access for collections,
  • integer overflows,
  • division by zero.

You can also extend its analysis rules by writing custom checkers.

You can run USVM in your repo CI by configuring the ByteFlow runner.

About condition-sensitive analysis

If we modify the above program a little, things change drastically: False positive sample

All interprodecural dataflow analysers we've tried report the similar warning for this program. However, this is false alarm: untrusted data is read only in development mode (that is, when production field is false), but the real database query happens only in production mode.

The reason why the existing analysers are wrong is the lack of condition-sensitive analysis: they simply do not understand that untrusted data is emitted only under conditions that prevent program from getting into checkUserIsAdminProd method.

The major reason for this is that condition-sensitive analysis is complex and expensive. USVM makes condition-sensitive analysis robust and scalable. In particular, USVM does not report warning in this program.

You can run the this example online in our demo repository.

Writing custom checkers

USVM allows to customize its behaviour by writing custom analysis rules. To achieve this, USVM can share its internal analysis states into the attached interpreter observers. So the first step to write a custom checker is to implement JcInterpreterObserver interface. This observer can be attached to symbolic machine in its constructor.

Now, before every instruction gets symbolically executed, the symbolic engine will notify the observer about the next instruction. For instance, if the engine has reached the throw instruction, the attached observer will recieve the corresponding event:

fun onThrowStatement(simpleValueResolver: JcSimpleValueResolver, stmt: JcThrowInst, stepScope: JcStepScope)

Here, stmt represents the instruction which was proven to be reachable. If, for example, your analysis looks for reachability of HorrificException, you can look into type of stmt.throwable.

The internal state of the analysis is stored into stepScope. In fact, your checker gets the representation of the whole program state in the branch that reaches stmt. You can query the arbitrary data stored into state or even modify it (allocate new information or modify the existing) using stepScope.calcOnState. For example, to allocate new memory fragment in program state, you can write

stepScope.calcOnState { memory.allocateConcreteRef() }

You can compute the validity of arbitrary logical predicates on state. Warning: this can cause queries to SMT solver, which can be time and memory demanding. To query the validity, you can write

stepScope.calcOnState { 
	clone().assert(your condition)?.let {
		... handling case when condition is satisfiable ...
	}
}

To form and report warnings in SARIF format, use built-in reporters.

You can browse the existing checkers for more examples and details.

Using USVM to confirm SARIF reports

In lots of cases, the exising static code analysers report false alarms. USVM has ability to confirm or reject the reported warnings.

To run USVM in trace reproduction mode, configure one of the analyses and pass a set of traces into JcMachine.

Also, this process can be customized by a rich set of options.

USVM for unit test generation

USVM has ability to discover all possible behaviours of a program. This is a key feature used in white-box test generation engines. In future, USVM will be the default code analysis engine in UnitTestBot for Java.

Other languages support in USVM

USVM.Core is a framework which provides highly optimized primitives for symbolic execution:

Thus, USVM.Core implements common primitives used in programming languages. This makes much easier instantiating USVM for new programming language: in fact, to support a programming language, you only need to write its interpreter in terms of operations provided by USVM.Core.

If you want to support a new language, please take a look at sample language support in USVM.