Skip to content
This repository has been archived by the owner on Aug 23, 2022. It is now read-only.

error while translating function with function pointer as parameter #784

Open
Stephen-lei opened this issue Apr 24, 2022 · 5 comments
Open

Comments

@Stephen-lei
Copy link
Contributor

I`m testing the completeness of mcsema and I use the EEMBC benchmark(the Coremark in this case) to test it.
Till now I am still testing x86 to x86
the source code is available here https://github.com/eembc/coremark
while translating the function with function pointer as a function parameter:(cmp is a function pointer)
image

it seems that mcsema can`t handle it correctly.
when core_list_mergesort calls the cmp function, the ERROR LLVM IR generated by the mcsema-lift is as follows:
image

in the above picture, we can see the translated LLVM IR will pass the value of RSP in the State_strcuture to the child cmp function.
But when going into the child cmp function

image

as you can see in the above picture, the cmp function writes RSP to RSP+8, and RSP+8 to RSP+16, which changes the State_strcuture of its parent function. And when it comes back to core_list_mergesort, some important value has been changed, which causes the program to fail.

I want to know how to solve this problem, or which part of mcsema is critical to verify to solve this problem.
l use the clang11 -O0 to make the original source code ,and the translation process is as follows:

mcsema-disass-3.8 --disassembler "/opt/idapro-7.7/idat64" --arch amd64 --os linux --entrypoint main --binary /mcsema-llvm11-EEMBC_test/test/coremark_test2 --output /mcsema-llvm11-EEMBC_test/test/coremark_test2.cfg --log_file /mcsema-llvm11-EEMBC_test/test/coremark_test2.log

mcsema-lift-11.0 --arch amd64 --os linux --cfg /mcsema-llvm11-EEMBC_test/test/coremark_test2.cfg --output /mcsema-llvm11-EEMBC_test/test/coremark_test2.bc --explicit_args --merge_segments --name_lifted_sections

@pgoodman
Copy link
Collaborator

I am not sure what is going on, but I strongly recommend migrating to anvill. I just compiled coremark for macOS, and opened up Binary Ninja on this specific function. I produced the attached files.
Archive.zip. Anvill was able to identify the indirect call (because Binary Ninja can recognize it, and anvill's Python scripts for interfacing with Binary Ninja can encode call site specific types).

  %123 = inttoptr i64 %1 to i32 (i64, i64, i64, i64*)*, !pc !44
  %124 = call i32 %123(i64 %119, i64 %122, i64 %2, i64* %117) #3, !pc !44

@pgoodman
Copy link
Collaborator

Otherwise, what it looks like mcsema is doing is seeing the indirect call, and then casting it to a generic N-integer-argument function. The N in this case is derived from --explicit_args and --explicit_args_count N (defaults to 8 or so). Then, it would probably try to identify that the comparator of the merge sort has its address taken, and alter references to that comparator to be native-to-lifted entrypoint callbacks (created by this function).

@Stephen-lei
Copy link
Contributor Author

@pgoodman thank you very much! In my opinion anvill and mcsema both are lifting_tools , and mcsema is stronger than anvil? (i see in the mcsema program anvil is part of it) Maybe my understanding is not enough. And thank you for telling me which part of the mcsema is response to this problem, I am now studying the Callback.cpp in mcsema and trying to figure out the reason why

@pgoodman
Copy link
Collaborator

mcsema hasn't been maintained in a long time, and is pinned on an older, less capable version of anvill and remill. Anvill has evolved substantially since then. However, Anvill still does not lift "whole programs" whereas mcsema does. Getting a working lifted program out of mcsema is a challenge.

@Stephen-lei
Copy link
Contributor Author

Stephen-lei commented May 10, 2022

@pgoodman Thanks! My aim is to translate a whole program, so l am trying to fix the mcsema. And sorry to disturb you that now I want to see the log information of mcsesa-lift (with the --log option), but there seems some mistake with my operation, and I cant generate a log file just like mcsema-dias . Ive searched the issue list and l can`t find a solution, would you mind telling me the right operation? thank you very much!
my operation is as follows:

mcsema-lift-11.0 --arch amd64 --os linux --cfg ***/***.cfg --output ***/***.bc --explicit_args --merge_segments --name_lifted_sections --log ***/***.log
i use the typing module just like mcsema-dias, but no log file is generated. with the introduction of --help
-log (Output log filename for lifter.) type: string default: ""
i am not sure why.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants