Skip to content

eschulte/llvm-mutate

Repository files navigation

Manipulate LLVM assembly

This tool performs the following operations (among others)

      ids | prints the total number of instructions
     list | list instructions with their types and ids
     name | name each instruction by its id
    trace | instrument to save an instruction execution trace
      cut | cuts the numbered instruction
   insert | copies the second numbered instruction before the first
  replace | replace inst1 with inst2
     swap | swaps the two numbered instructions

Installation

  This tool is implemented as an LLVM compiler pass (see [1]).  It is
  built using the LLVM cmake machinery [2], and requires that LLVM was
  installed with cmake.  However, this does *not* require access to
  the LLVM source tree.

  The HEAD of the LLVM svn repository (currently revision 182214) is
  known to work.  Branch llvm-3.2 of this repository compiles against
  LLVM version 3.2.

  Run the following to build and install llvm-mutate, changing the
  llvm/cmake path /usr/local/share/llvm/cmake to the appropriate path
  on your system.

    mkdir build
    cd build
    cmake -DCMAKE_MODULE_PATH=/usr/local/share/llvm/cmake ../
    make
    sudo make install

Examples

  First compile a source file to LLVM assembly.

    $ echo 'main(){ puts("hello"); puts("goodbye");}' \
        |clang -x c - -S -emit-llvm -o greet.ll

  See the behavior of the original.

    $ cat greet.ll|llvm-mutate -l

    $ ./a.out
    hello
    goodbye

  Count the instruction ids in the original program.

    $ cat greet.ll|llvm-mutate -I -o /dev/null
    3

  Compile a version which saves an instruction execution trace.  First
  this requires linking an object file defining a function with the
  following name and type signature.

    void llvm_mutate_trace(int count);

  An example is provided in `llvm_mutate_trace.c`.  This function will
  be called once for each executed instruction and will be passed the
  ID of the executed instruction as its argument.

    $ gcc -c llvm_mutate_trace.c
    $ cat greet.ll|llvm-mutate -t -l -T llvm_mutate_trace.o
    $ ./a.out 
    hello
    goodbye
    $ cat llvm_mutate_trace
    1
    2
    3

  Cut the first instruction from this program, compile and run.

    $ cat greet.ll|llvm-mutate -c 1 -l
    cut 1

    $ ./a.out
    goodbye

  Insert the second instruction before the first.

    $ cat greet.ll|llvm-mutate -i 1,2 -l
    inserted 2 before 1

    $ ./a.out
    goodbye
    hello
    goodbye

  Swap the first two instructions.

    $ cat greet.ll|llvm-mutate -s 1,2 -l
    swapped 1 with 2

    $ ./a.out
    goodbye
    hello

  We can also look at a slightly more complicated file which actually
  has some data dependencies between instructions.  Dependencies which
  the mutation tool will need to address.

    $ echo 'main(){ int x=2; x+=3; x=x*x; printf("%d\n", x);}' \
      |clang -x c - -S -emit-llvm -o arith.ll

    $ cat arith.ll|llvm-mutate -l

    $ ./a.out
    25

  Here mutations will change the data dependencies between the
  instructions, and effect the value of `x` which is printed at the
  end.

    $ cat arith.ll|llvm-mutate -c 4 -l
    found local replacement: 0x2ac0aa8
    cut 4

    $ ./a.out
    4

    $ cat arith.ll|llvm-mutate -c 3 -l
    cut 3

    $ ./a.out
    9

    $ cat arith.ll|llvm-mutate -c 6 -l
    found local replacement: 0x39aaaa8
    cut 6

    $ ./a.out
    10

  The compiler pass attempts to plug new instructions into the call
  graph near where they are inserted, satisfying their arguments with
  in-scope variables and plugging their output into the arguments of
  subsequent variables.

    $ cat arith.ll|llvm-mutate -i 6,9 -l
    replacing argument: 0x2f6ce50
    found local replacement: 0x2f6cae8
    inserted 9 before 6

    $ ./a.out
    4

  When this isn't possible, a warning is printed and the insertion
  likely has no effect.

    $ cat arith.ll|llvm-mutate -i 4,10 -l
    could find no use for result
    inserted 10 before 4

    $ ./a.out
    25

  Unless of course the inserted instruction acts through side effects.
  For example this copies the printf instruction into the middle of
  the function.

    $ cat arith.ll|llvm-mutate -i 4,11 -l
    replacing argument: 0x1e34f88
    found local replacement: 0x1e34ae8
    inserted 11 before 4

    $ ./a.out
    2
    25

  See the output of `llvm-mutate --help` for more actions which may
  be performed on compiled llvm IR.  If multiple options are given to
  llvm-mutate they will be applied in series to the code.  So for
  example the following command line will,

    $ cat arith.ll|llvm-mutate -I -g -G -c 3 -I -i 4,10 -I -G -l
    12
    cut 3
    11
    replacing argument: 0x1d7df08
    found local replacement: 0x1d7db80
    inserted 10 before 4
    12

    $ ./a.out 
    3
    9

  1. print a count of instruction ids,
  2. generate the program control flow graph
     (shown on screen if `dot` and `feh` are installed),
  3. generate the program call graph
  4. cut an instruction,
  5. print another id count,
  6. inserts an instruction,
  7. prints a third instruction id count,
  8. display the new call graph
  9. and finally link the resulting LLVM IR into an executable.

License

  Licensed under the GPLV3, see the COPYING file in this directory for
  more information.

Footnotes

[1]  http://llvm.org/docs/WritingAnLLVMPass.html

[2]  http://llvm.org/docs/CMake.html#developing-llvm-pass-out-of-source