Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Early stop of IR CFG's Symbolic Execution (stops at SHR instruction) #1471

Open
psyirius opened this issue Feb 14, 2024 · 12 comments
Open

Early stop of IR CFG's Symbolic Execution (stops at SHR instruction) #1471

psyirius opened this issue Feb 14, 2024 · 12 comments

Comments

@psyirius
Copy link

ASM Block
image

IR CFG
image

END of se.run_at with step=True
image

@serpilliere
Copy link
Contributor

Hi @psyirius !
The explanation is there: https://github.com/cea-sec/miasm/blob/master/doc/ir/lift.ipynb
The SHR instruction will generate multiple IR block as:

  • if the shifter is not null, it will set the flags
  • if the shifter is null, the instruction will not set flags.

So the generate IR reflect this.
Why did we do that?
If you read this doc, it's a bit like thecmovz EAX, EBX:
Imagine the compilator doeesn't use this optimisation, it will generate a kind of:

if zf then 
    EAX = EBX

In miasm, the cmovz will generate the very same IR graph as the IF/then.
So the positive point is in IR world, using cmovz or if/then has the same difficulty to analyze
But you could make the same remark: "why does symb exec stops on cmovz?"
This is why :)

@psyirius
Copy link
Author

psyirius commented Feb 15, 2024

Got any quick workaround for the sym exec to reach where RCX is assigned a const value?

Btw se.run_at is not fully executed even after supplying lbl_stop!

@serpilliere
Copy link
Contributor

Hi
If, during the symbolic execution RCX is evaluated to a constant, Miasm will resolve flags and pursue the execution.
If you want to do the execution with a symbolic value un RCX, that's getting complicated:

  • For example, you can arbitrary split your execution, doing one considereing RCX null, and another considering RCX not null.
  • Or maybe you are not interested in the case RCX is null so you can modify the IR code generated by miasm (for shr, shl, rol, ror, ...) to only keep the case where RCX is not null (in this case, you may have invalid flags if one day RCX may be null).

I didn't get your point with the lbl_stop stuff. Can you detail this?

@psyirius
Copy link
Author

psyirius commented Feb 15, 2024

Okay, I get it now!

It stops because of the non-deterministic branching caused by SHR which got unresolved dependencies.

I was wondering why there is a break in control-flow when its not in ASM.

Thank you for the answer.

I wonder if there is any easier workaround to merge that extra generated branch into the states and pursue execution to reach the RCX assignment!

@mrexodia
Copy link
Contributor

Hi @psyirius ! The explanation is there: https://github.com/cea-sec/miasm/blob/master/doc/ir/lift.ipynb The SHR instruction will generate multiple IR block as:

  • if the shifter is not null, it will set the flags
  • if the shifter is null, the instruction will not set flags.

So the generate IR reflect this. Why did we do that? If you read this doc, it's a bit like thecmovz EAX, EBX: Imagine the compilator doeesn't use this optimisation, it will generate a kind of:

I know this decision was made somewhere early in miasm's development, but I was still not able to find any motivation for it. When doing symbolic execution these IF statements are creating a major annoyance and you have to write your own IR pass that rewrites it into ternary statements.

Obviously for things like rep movsb there is no way around it, but I feel that Triton's model of using ternary expressions whenever possible is more suitable in practice...

@serpilliere
Copy link
Contributor

Hi @mrexodia !
How do you model the fact that the flag is modified or not, depending on a condition, for example for the shr instruction?

@mrexodia
Copy link
Contributor

I do not think it is important information whether the flag was modified or not. The important information is usually the expression extracted from the flag at a later point in execution (usually a branch).

Similarly with division, of course there is an edge case where you divide by zero but it is unlikely to happen in practice and we usually just assume it is non-zero (in fact I add that as a additional constraint).

Of course I understand the correctness argument to some extent, but this part of miasm is hindering experimentation on “simple” obfuscation schemes for beginners in my experience…

@serpilliere
Copy link
Contributor

I agree with you on the division part:
I think it doesn't add much to add a code that handles the division error in the IR.
My argument is that it's a runtime error, and, like for example memory lookup & memory writes, it can generate runtime exception. Adding code at the IR level to check memory accesses and dealing with error is not really interesting here (and we don't do that in the IR).

At an extend point, if one simplification decides that the memory lookup or write can be simplified (ie removed), it's here that we may remove a code that can potentially fail. So maybe the simplification engine has to take the decision. And the problem may be the same for example by wanting to swap 2 memory lookup for memory barriers.

But for the "precision vs simple analyze", I find your point a bit more arbitrary:
The use of rep movsb or cmovz or shr (which may be replaced by a lea for example), etc totally depends on the compiler and its options, not on the "simplicity of the code". I mean, as you said, a simple hand written code will loop and not use rep movb.

Masking the difficulty of such patterns, which transform a 'data dependency' into an 'execution flow problem' will still be there in those other cases even if we patch the shr. And if you want to analyze those codes, you will have to deal with those patterns.

Some times ago, on an old miasm version, a user reported a problem in the simplification engine. In his code snip from its backtrace, I saw that he did the following thing:
He re implemented the 'problematic' instruction miasm IR to match his wish. And his which was something like (not real code here):

def shr(ir, instr, dst, src):
    return ExprAssign(dst, dst >> src)

@psyirius
Copy link
Author

Can't we do the below to replace the extra generated IR block?

# something like 
# sym = affect(sym) if cond else sym

ExprCond(
   cond=cond_used_in_the_branching,
   src1=expr_that_is_in_the_generated_block,
   src2=org_symbol,
)

@serpilliere
Copy link
Contributor

Hi @psyirius
Currently, the assignment is not an expression, but more likely a statement.
If we have:

shr eax, cl

and don't want to have an extra block, to generate the of for example, we can do this:

of = ExprCond(cond, new_value, of)

which means "ok, if the condition it true, assign new_value to of else, assign it's original value (which has for result no modification)
This is what we did in the very very first version of miasm for the cmovz for instance.

But it has it's limits.
For example, in ARM architecture, each instruction can be executed on a custom condition. Let's take str:

str r0, [r1]

which will store r0 at memory pointed at address r1.

If we have a code C like this:

if (cond) {
    tab[x] = a;
    tab[y] = b;
} else {
    tab[x] = c;
    tab[y] = d;
}

you may result with this code:

STREQ R4, [R0]
STREQ R5, [R1]
STRNE R6, [R2]
STRNE R7, [R3]

Question: What kind of IR will you generate?
If you generate using the previous trick, your code will be:

ExprMem(R0) = ExprCond(cond, R4, ExprMem(R0))

which will generate dummy read/write on memory.

Using the current IR, miasm generates:

graph_cond

Adding a custom simplification rule (to factor same condition code) you ends with;
graph_fact

@W0ni
Copy link
Contributor

W0ni commented Feb 25, 2024

For shift/rotates, both approaches have their advantages and pitfalls. Maybe we could add an optional argument on the lifter to choose between ternary expression and ITE blocks for the flags?

@serpilliere
Copy link
Contributor

Hi @W0ni
Yes, that's a good solution!
And the location of the option (in the lifter) seems perfect to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants