Skip to content

Latest commit

 

History

History
208 lines (170 loc) · 7.63 KB

NOTES

File metadata and controls

208 lines (170 loc) · 7.63 KB

Notes and Tasks -*- org -*-

Notes

Types of Mutations

The three phases of development of this tool. Could also serve as the top level of a taxonomy of mutations.

side-effect
manipulating side-effect instructions (e.g., pritnf), super easy, didn’t need to worry about results of instructions, and changing them directly changes when and what the side effects do. For example manipulations on greet.ll.
data-flow
strings of instructions which change the state of the system. This becomes more complex because we now need to feed local edges of the CFG into newly added instructions, weave the results of new instructions back into the CFG and patch up CFG holes after removing instructions. For example manipulations on arith.ll.
control-flow
modifications which actually affect the flow of control through the program. Modification of comparison instructions, labels and branch instructions. Ultimately this will probably also include some basic block operations e.g., stitching two basic blocks together when e.g., we remove the terminator of the leading BB. For example operations on branch.ll.

Primary functions

A couple of basic operations are needed across mutation types and are implemented as core functions. The following will list them and their uses.

find instance of type

Find a value available in the program around a given instruction which has a certain type. This is used to satisfy the operands of newly inserted instructions (e.g., replace operands) and to replace the results of a cut instruction.

For an instruction I of type T.

  1. values in Basic Block before I of type T
  2. arguments to the function containing I of type T
  3. global values of type T
  4. give up and return a 0

This function needs a better more forgiving way of comparing types, currently it doesn’t do a good enough job of satisfying e.g., comparison output types.

use result

Used to plug the output of an instruction into the CFG local to the instruction. For example to use the result of a newly inserted instruction.

Looks for the first opening of the correct type, and inserts there. We let the rest of LLVM clean up and dangling SSA registers.

replace operands

This satisfies the operands of a newly inserted instruction. It is largely just a wrapper around find instance of type, but with an initial phase when limits the replacement to instructions which are actually not in scope.

Reference

  • file:../../../docs/ProgrammersManual.html

Tasks [3/8]

crossover

Maybe this should be done at the basic block level.

Would the compiler need to simultaneously walk one module for each parent?

cut correctly handle deletion of final instruction in block

do this by folding it into its successor

/// FoldBlockIntoPredecessor - Folds a basic block into its predecessor if it
/// only has one predecessor, and that predecessor only has one successor.
/// The LoopInfo Analysis that is passed will be kept consistent.
/// Returns the new combined block.
static BasicBlock *FoldBlockIntoPredecessor(BasicBlock *BB, LoopInfo* LI,
// If this is a terminator instruction we might have problems.
if(!isa<TerminatorInst>(I)){
  BasicBlock *BB = I->getParent();
  // merge into following block if only one exists
  if(BB->getTerminator()->getNumSuccessors() != 1){
    errs() << "can't merge with multiple subsequent blocks\n";
  } else {
    Function::iterator I = BB;
    Function::iterator E = BB->getParent()->end();
    if(BB == E){
      errs() << "can't delete terminator of last block\n";
    } else {
      BasicBlock *Succ = I++;
      if (! Succ->getSinglePredecessor()){
        errs() << "next block has multiple preds, can't merge\n";
        return false;
      } else {
        // merge with the following block
        FoldSingleEntryPHINodes(Succ);
        BB->getInstList().pop_back();
        Succ->replaceAllUsesWith(BB);
        BB->getInstList().splice(BB->end(), Succ->getInstList());
        Succ->eraseFromParent(); } } } }

Here’s the beginning of such a function.

// If this instruction is a terminator, then erase the remainder of the
// basic block behind it.
void terminateIfTerminator(Instruction *I){
  if(!isa<TerminatorInst>(I)) return;
  BasicBlock *B = I->getParent();
  BasicBlock::iterator Iter = I;
  for (BasicBlock::iterator i = B->end(); i != Iter; --i){
    i->eraseFromParent(); }
}

populated inserted instruction arguments from scope

When inserting an instruction we need to do two things.

  1. find arguments from the current scope, replaceOperands.
    • RemapInstruction
      /// RemapInstruction - Convert the instruction operands from referencing the
      /// current values into those specified by VMap.
      static inline void RemapInstruction(Instruction *I, ValueToValueMapTy &VMap)
              

      this is used a couple of places RemapInstruction, read its documentation

    • get arguments from nearby instructions
      ->getOperand(0)
              
    • set operand to something specific
      Clone->setArgOperand(0, Op);
              
    • add input operands to an instruction
      // Insert new integer induction variable.
      PHINode *NewPHI = PHINode::Create(Int32Ty, 2, PN->getName()+".int", PN);
      NewPHI->addIncoming(ConstantInt::get(Int32Ty, InitValue),
                          PN->getIncomingBlock(IncomingEdge));
              
    • operation iterator
      user->op_iterator()
              
  2. assign its result into something in the current scope, this could be as simple as just taking the next instruction which requires an argument of the same type as the inserted instruction

swap is really just two replaces

To replace we need to,

  1. replace everything using the original with the new, this is done by ReplaceInstWithInst
  2. replace all arguments of the new with arguments of the original (or appropriate arguments selected from the environment)
    replaceOperands
        

add a replace operator

better type matching, labels with labels etc

Take a look at the TypeID, which we may be able to cast from. http://llvm.org/docs/doxygen/html/classllvm_1_1Type.html#a5e9e1c0dd93557be1b4ad72860f3cbda

Types may be mutated

There is a mutateType function on values, so maybe we should mutate a readily available type if we can’t find one we like.

Here’s a specific example, where two cmp values aren’t matched

cat branch.ll|./llvm-mutate -c 11

Ideally subsequent uses of %cmp should be replaced with a default value like a false comparison.

maybe look at mutating basic blocks as well as single instructions

it might be nice to visualize data flow

The following shell munging of LLVM IR has backwards arrows, and is incomplete, but still gets pretty close.

SED_CMD=''
SED_CMD+='s/ = / <- /;s/alloca/%alloca/;'
SED_CMD+='s/ [^%< ][^ ]*//g;'
SED_CMD+='s/[,()]//g;'
SED_CMD+='s/%alloca/alloca/g;'
SED_CMD+='s/%//g;s/$/;/;'
SED_CMD+='s/<-/->/;'

echo "$RAW"|grep "^ \+%"|sed ''

echo -e "digraph{\n $BODY\n}"|dot -Tpng|feh -

remove the use of stdio

There must be a native llvm way to construct these names. (see the sprintf usage in Name).