Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to trace glibc functions #4

Open
snx90 opened this issue Aug 21, 2019 · 14 comments
Open

Unable to trace glibc functions #4

snx90 opened this issue Aug 21, 2019 · 14 comments
Labels
question Further information is requested

Comments

@snx90
Copy link

snx90 commented Aug 21, 2019

Hi guys,

I am facing some issues when trying to trace binaries with calls to glibc functions. I have written a C toy example:

#include <string.h>

int main()
{
        char str[50];

        memset(str, 0x0, 50);

        return 0;
}

and the corresponding rainbow script (very simple as well):

from rainbow.generics import rainbow_x64
e = rainbow_x64()
e.load("memset_ex1", typ=".elf")
e.mem_trace = 1
e.trace_regs = 1
e.function_calls = 1
e.start(e.functions["main"], 0)

The resulting output can be read in this file: output_rainbow_memset.txt

I remember having similar issues with unicorn when glibc was not mapped (so a hook to skip the calling instruction was enough) but I assume the other binaries in your examples also have glibc functions. Am I missing something?

Thanks in advance.

@yhql
Copy link
Contributor

yhql commented Aug 21, 2019

Hi,

This is the same problem that you would get with Unicorn indeed. Here you have two options.

First one is to hook the calls you are interested in and reimplement them in Python. For example, assuming I have my calling convention right:

e.load("memset_ex1", typ=".elf")

def my_memset(emu):
    # 'emu' is the whole rainbow instance
    addr = emu['rdi']
    value = emu['rsi']
    length = emu['rdx']   
    emu[addr:addr+length] = value
    return True  # Tell Unicorn/Rainbow that you return to the caller site

e.stubbed_functions['memset'] = my_memset

This is useful if you need to change the behaviour of a function like in this example, but painful if you have more complex library calls to implement.

The second option is, as you mentioned, mapping the library into the emulator's memory, and then fixing the relocations by hand. I have not experimented yet with this solution.

@yhql yhql added the question Further information is requested label Aug 21, 2019
@snx90
Copy link
Author

snx90 commented Aug 23, 2019

What about skipping, in this case, the call to memset (i.e., not tracing it as you can do when directly using Unicorn):

instructions_skip_list = [addr_of_memset]
def my_hook(mu, address, size, user_data):
    if address in instructions_skip_list:
        mu.reg_write(UC_X86_REG_RIP, address+size)

Do you have an API for doing that? (If not, it could be an interesting feature since in many cases these kind of operations are just noise, so this would help to reduce the length of traces.)

@yhql
Copy link
Contributor

yhql commented Aug 23, 2019

Yes you can do it like this (pretty much like the reimplementation option, only this time the function does not nothing but return):

def bypass(emu):
    return True

e.stubbed_functions['memset'] = bypass

@yhql
Copy link
Contributor

yhql commented Dec 15, 2019

@snx90 Did you manage to do what you wanted ?

@snx90
Copy link
Author

snx90 commented Dec 24, 2019

Hi @yhql, yes I did. I followed your approach which allowed me to trace simple libc functions. Not a perfect solution, but it works for my current tasks.

@scrambler-crypto
Copy link

Hi, I tried the same approach inspired by the examples (using a bypass function) but it seems to not take it when emulating the function.
I've the following error:

    5FF4  add     byte ptr [rax], al  ;
    5FF6  add     byte ptr [rax], al  ;
    5FF8  add     byte ptr [rax], al  ;
    5FFA  add     byte ptr [rax], al  ;Exception ignored on calling ctypes callback function: <bound method Uc._hook_mem_invalid_cb of <unicorn.unicorn.Uc object at 0x7f1f3664a490>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/unicorn/unicorn.py", line 443, in _hook_mem_invalid_cb
    return cb(self, access, address, size, value, data)
  File "/home/user/.local/lib/python3.8/site-packages/rainbow-1.0-py3.8.egg/rainbow/rainbow.py", line 371, in unmapped_hook
Exception: Unmapped fetch at 0x6000 (Emu stopped in 5ffc)

is it a known problem?

Do you have an example on how to map the library as suggested in the "second option"?

@yhql
Copy link
Contributor

yhql commented Sep 29, 2020

Hi,
Sorry I am not sure what is happening from the trace you're showing, can you psot an excerpt from the python code you use to set up the execution ?
Sadly I don't have an example yet on the second option.

@scrambler-crypto
Copy link

here you have the code:

## definition of bypass function
device = rainbow_x64(sca_mode=False, local_vars=globals())
device.load("tracer.elf", typ=".elf", verbose=True)
## code to set arguments and everything else
device.stubbed_functions["memset"] = bypass
device.start(device.functions["encrypt_ecb"], 0)

from what I see in the trace the execution runs the code from the address of memset function, the bypass is not set (indeed looking at device.stubbed_functions the entry for memset is not set, but it works as a charm for non-imported symbols)
i am using the last version of rainbow and unicorn

@yhql
Copy link
Contributor

yhql commented Sep 29, 2020

Is "memset" also in device.function_names.keys() ? If not the hook will not work actually.
If that's the case that might mean the ELF parser did not register this symbol, which should be fixable.

@scrambler-crypto
Copy link

keys() only shows the addresses, but memset is not in the values() either.
lief 0.10.1 is installed

@yhql
Copy link
Contributor

yhql commented Sep 29, 2020

True ! My bad. So this seems like the loader does not fetch all external symbols.
I'll try to reproduce and find a fix.
In the meantime, you can probably try to add "memset" manually into device.functions and device.function_names, knowing its address.

@scrambler-crypto
Copy link

thanks a lot!

@yhql
Copy link
Contributor

yhql commented Oct 5, 2020

Do you have a mangled symbol for memset instead in the list of functions (like memset@@GLIBC_X.X.X) ?

EDIT: The trace you get is most likely because the symbol is here but defined at address 0 because it is a dynamic relocation. So it keeps executing a bunch of zeroes (which will be interpreted as add byte ptr [rax], al) until it hits an address that is not mapped anymore.

@scrambler-crypto
Copy link

No.
I've memset@@GLIBC_X.X.X.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants