Notes and Tasks -*- org -*-
The three phases of development of this tool. Could also serve as the top level of a taxonomy of mutations.
- side-effect
- manipulating side-effect instructions (e.g.,
pritnf
), super easy, didn’t need to worry about results of instructions, and changing them directly changes when and what the side effects do. For example manipulations ongreet.ll
. - data-flow
- strings of instructions which change the state of the
system. This becomes more complex because we now need
to feed local edges of the CFG into newly added
instructions, weave the results of new instructions
back into the CFG and patch up CFG holes after removing
instructions. For example manipulations on
arith.ll
. - control-flow
- modifications which actually affect the flow of
control through the program. Modification of
comparison instructions, labels and branch
instructions. Ultimately this will probably also
include some basic block operations e.g., stitching
two basic blocks together when e.g., we remove the
terminator of the leading BB. For example
operations on
branch.ll
.
A couple of basic operations are needed across mutation types and are implemented as core functions. The following will list them and their uses.
Find a value available in the program around a given instruction which has a certain type. This is used to satisfy the operands of newly inserted instructions (e.g., replace operands) and to replace the results of a cut instruction.For an instruction I of type T.
- values in Basic Block before I of type T
- arguments to the function containing I of type T
- global values of type T
- give up and return a 0
This function needs a better more forgiving way of comparing types, currently it doesn’t do a good enough job of satisfying e.g., comparison output types.
Used to plug the output of an instruction into the CFG local to the instruction. For example to use the result of a newly inserted instruction.Looks for the first opening of the correct type, and inserts there. We let the rest of LLVM clean up and dangling SSA registers.
This satisfies the operands of a newly inserted instruction. It is largely just a wrapper around find instance of type, but with an initial phase when limits the replacement to instructions which are actually not in scope.- file:../../../docs/ProgrammersManual.html
Maybe this should be done at the basic block level.
Would the compiler need to simultaneously walk one module for each parent?
do this by folding it into its successor
/// FoldBlockIntoPredecessor - Folds a basic block into its predecessor if it /// only has one predecessor, and that predecessor only has one successor. /// The LoopInfo Analysis that is passed will be kept consistent. /// Returns the new combined block. static BasicBlock *FoldBlockIntoPredecessor(BasicBlock *BB, LoopInfo* LI,
// If this is a terminator instruction we might have problems.
if(!isa<TerminatorInst>(I)){
BasicBlock *BB = I->getParent();
// merge into following block if only one exists
if(BB->getTerminator()->getNumSuccessors() != 1){
errs() << "can't merge with multiple subsequent blocks\n";
} else {
Function::iterator I = BB;
Function::iterator E = BB->getParent()->end();
if(BB == E){
errs() << "can't delete terminator of last block\n";
} else {
BasicBlock *Succ = I++;
if (! Succ->getSinglePredecessor()){
errs() << "next block has multiple preds, can't merge\n";
return false;
} else {
// merge with the following block
FoldSingleEntryPHINodes(Succ);
BB->getInstList().pop_back();
Succ->replaceAllUsesWith(BB);
BB->getInstList().splice(BB->end(), Succ->getInstList());
Succ->eraseFromParent(); } } } }
Here’s the beginning of such a function.
// If this instruction is a terminator, then erase the remainder of the
// basic block behind it.
void terminateIfTerminator(Instruction *I){
if(!isa<TerminatorInst>(I)) return;
BasicBlock *B = I->getParent();
BasicBlock::iterator Iter = I;
for (BasicBlock::iterator i = B->end(); i != Iter; --i){
i->eraseFromParent(); }
}
When inserting an instruction we need to do two things.
- find arguments from the current scope,
replaceOperands
.- RemapInstruction
/// RemapInstruction - Convert the instruction operands from referencing the /// current values into those specified by VMap. static inline void RemapInstruction(Instruction *I, ValueToValueMapTy &VMap)
this is used a couple of places
RemapInstruction
, read its documentation - get arguments from nearby instructions
->getOperand(0)
- set operand to something specific
Clone->setArgOperand(0, Op);
- add input operands to an instruction
// Insert new integer induction variable. PHINode *NewPHI = PHINode::Create(Int32Ty, 2, PN->getName()+".int", PN); NewPHI->addIncoming(ConstantInt::get(Int32Ty, InitValue), PN->getIncomingBlock(IncomingEdge));
- operation iterator
user->op_iterator()
- RemapInstruction
- assign its result into something in the current scope, this could be as simple as just taking the next instruction which requires an argument of the same type as the inserted instruction
To replace we need to,
- replace everything using the original with the new, this is done
by
ReplaceInstWithInst
- replace all arguments of the new with arguments of the original
(or appropriate arguments selected from the environment)
replaceOperands
Take a look at the TypeID, which we may be able to cast from. http://llvm.org/docs/doxygen/html/classllvm_1_1Type.html#a5e9e1c0dd93557be1b4ad72860f3cbda
There is a mutateType function on values, so maybe we should mutate a readily available type if we can’t find one we like.
cat branch.ll|./llvm-mutate -c 11
Ideally subsequent uses of %cmp should be replaced with a default value like a false comparison.
The following shell munging of LLVM IR has backwards arrows, and is incomplete, but still gets pretty close.
SED_CMD=''
SED_CMD+='s/ = / <- /;s/alloca/%alloca/;'
SED_CMD+='s/ [^%< ][^ ]*//g;'
SED_CMD+='s/[,()]//g;'
SED_CMD+='s/%alloca/alloca/g;'
SED_CMD+='s/%//g;s/$/;/;'
SED_CMD+='s/<-/->/;'
echo "$RAW"|grep "^ \+%"|sed ''
echo -e "digraph{\n $BODY\n}"|dot -Tpng|feh -
There must be a native llvm way to construct these names. (see the sprintf usage in Name).