Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No examples #10

Open
zachriggle opened this issue May 18, 2015 · 5 comments
Open

No examples #10

zachriggle opened this issue May 18, 2015 · 5 comments
Labels

Comments

@zachriggle
Copy link
Contributor

This seems like a really great project with a lot of potential uses. For one, I'd like to use it in my pwndbg debugger scripts for GDB in order to simulate execution of a single basic block (to determine the expected path of a conditional branch).

Unfortunately, there do not appear to be any examples which I can base this work off of. Since I'm running in a debugger, I can provide complete state information, and even memory access.

How would I go about simulating a short sequence of instructions, given a known starting register context?

@bdcht
Copy link
Owner

bdcht commented May 19, 2015

Hi Zach,

Indeed, I need to add examples from the tests/ directory but its currently
a bunch of ugly files there and I'd like to rewrite some examples before
merging them into release. One other problem that I have is that my samples
are often real binaries/malware which can't be released, so I need to build
small toy examples first. Not a big deal but just another task. I agree it
would be more helpful than a tl;dr readme...

To answer your question, it depends on the input you provide. If amoco has
access to the binary file, and you provide the location
of the short sequence of instructions as a Python int/long, you would do
the following:

first import necessary stuff:

import amoco

Now, assuming you are analyzing 'example.exe' binary,

p = amoco.system.loader.load_program('example.exe')
z = amoco.lsweep(p)

which loads the binary in amoco's memory model, and creates a linear sweep
instance that allows to easily disassemble
basic blocks at given address.
Now when you need to simulate a sequence of instructions at some address (a
Python int/long value), either do:

b = z.getblock(address)

to retreive the basic block, and if you want to analyze only the first 3
instructions do:

b = amoco.code.block(b.instr[0:3])

Now b.map holds a mapper object that captures symbolically the "execution"
of this sequence of instructions. You can "print b.map" to
see the effect on registers and memory.

If you want to only provide the sequence as a "shellcode" Python string,
you need to load the amoco.arch module yourself first.

Assuming its x86 for example you would do:

import amoco.arch.x86.cpu_x86 as cpu
shellcode = "\x31\xed\x5e\x89\xe1"
loc = cpu.cst(0x8048380,32)
instr = [ ]
while len(shellcode)>0:
i = cpu.disassemble(shellcode,address=loc)
l = i.length
instr.append(i)
shellcode = shellcode[l:]
loc += l
b = amoco.code.block(instr)

Now that we have the block object to be studied lets proceed with
"simulation":

Let's assume that we have:

print b

--- block 0x8048380 ---

0x8048380 '31ed' xor ebp, ebp
0x8048382 '5e' pop esi
0x8048383 '89e1' mov ecx, esp
print b.map
eflags <- { | [0:1]->0x0 | [1:2]->eflags[1:2] | [2:3]->0x0 | ... | }
ebp <- { | [0:32]->0x0 | }
esi <- { | [0:32]->M32(esp) | }
esp <- { | [0:32]->(esp+0x4) | }
eip <- { | [0:32]->(eip+0x5) | }
ecx <- { | [0:32]->(esp+0x4) | }

Since you are in a debugger you have indeed access to "concrete" values so
you would like to replace some symbolic values of registers or even memory
locations by their concrete values. To do this you just need to provide a
state in the form of a mapper that you can define from
the real gdb state:

m = amoco.cas.mapper.mapper()
m[p.cpu.eip] = b.address
m[p.cpu.esp] = p.cpu.cst(0xff885b38,32)
m[p.cpu.mem(p.cpu.esp,32)] = p.cpu.cst(0x1234,32)

for example here I assume that the state before executing/simulating b
still has symbolic ebx/ecx/eflags/... but I define eip value to be that
obviously of address (note that b.address is of type
amoco.cas.expressions.cst), I define also esp value and I put 0x1234 32
bits constant
at memory location pointed by esp. The rest of the stack is left symbolic.

Now if I do:

s = b.map<<m
print s(p.cpu.ecx)
0xff885b3c
print s(p.cpu.esi)
0x1234
print s(p.cpu.eip)
0x8048385
print s(p.cpu.ebp)
0x0
print s(p.cpu.eax)
eax

Note that all "values" are of type amoco.cas.expressions.cst so if you want
them back to Python int/long just do for example

s(p.cpu.ecx).value
4287126332

If you are now fetching some new instructions based on s(p.cpu.eip) as a
new block, you can immediately simulate its execution:

bnext = p.getblock(s(p.cpu.eip).value)
snext = s>>bnext.map

Thats for a start :) don't hesitate if you have more questions !

Thanks,
Axel

On Mon, May 18, 2015 at 11:44 PM, Zach Riggle notifications@github.com
wrote:

This seems like a really great project with a lot of potential uses. For
one, I'd like to use it in my pwndbg
https://github.com/zachriggle/pwndbg debugger scripts for GDB in order
to simulate execution of a single basic block (to determine the expected
path of a conditional branch).

Unfortunately, there do not appear to be any examples which I can base
this work off of. Since I'm running in a debugger, I can provide complete
state information, and even memory access.

How would I go about simulating a short sequence of instructions, given a
known starting register context?


Reply to this email directly or view it on GitHub
#10.

@zachriggle
Copy link
Contributor Author

Is there a way to provide a hook for memory access? I won't know ahead of time all of the addresses which Amoco will need to look up.

@bdcht
Copy link
Owner

bdcht commented May 22, 2015

If I understand well, you'd like to provide an interface to a concrete state to a mapper and get the resulting mapper where all (or some) symbolic values have been replaced by their concrete value (cst objects) in the provided state interface. Good idea...

To implement this, I would rely on the eval method already found in all cas.expressions classes and would just implement a new "concrete mapper" as env argument which rather than evaluating the expression in the usual amoco.cas.mapper.mapper class would evaluate the expression out of the gdb state.
Basically, this new class 'cmapper' would ensure that for whatever amoco expression e, it is able to return an amoco.expressions.cst object representing the concretisation of e (out of whatever concrete state interface it implements), and then you would only
need to do:
b.map.eval(cmapper(params))
with possibly params indicating that some symbolic expressions need to stay in a symbolic form.

I will implement the skeleton of cmapper based on this idea so that you'd only need to implement the extraction of concrete values out of the gdb/whatever state.

stay tuned

@zachriggle
Copy link
Contributor Author

Yeah, basically this allows use as an emulator instead of a strictly symbolic solver.

@bdcht
Copy link
Owner

bdcht commented May 24, 2015

I've added a 'csi' attribute directly in the mapper class which (if set) acts as a hook function to implement a concrete state interface. I'll still need to document this with examples but here is the idea :

you define your own function that takes a amoco expression as input
and returns a amoco cst expression when input is a reg expression or a Python str if the input is a mem expression. Now you create an empty mapper m as above and just set its csi attribute to point to your function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants