Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UB when sdiv overflows #655

Open
katrinafyi opened this issue Feb 21, 2023 · 5 comments
Open

UB when sdiv overflows #655

katrinafyi opened this issue Feb 21, 2023 · 5 comments

Comments

@katrinafyi
Copy link

The semantics provided for the SDIV instruction trigger undefined behaviour on INT_MIN / -1.

For example, 010cc19a sdiv x1, x0, x1 gives us something like:

%4 = load i64, ptr %X0, align 8
%5 = load i64, ptr %X1, align 8
// ....
%div.i.i = sdiv i64 %4, %5

Since this overflows a 64-bit int, LLVM semantics specify it is UB.

However, the ARM reference does not raise an exception in such a case; it simply evaluates to INT_MIN.

@pgoodman
Copy link
Collaborator

Here's where we implement it for aarch64: https://github.com/lifting-bits/remill/blob/master/lib/Arch/AArch64/Semantics/BINARY.cpp#L140-L151

For x86, I think we explicitly detect signed underflow, though it adds a bunch of complexity to the semantics:
https://github.com/lifting-bits/remill/blob/master/lib/Arch/X86/Semantics/BINARY.cpp#L488-L489

Also the auto-formatting with MAKE_IDIVxax on that file looks awful but that is unrelated.

@katrinafyi I see your llvm-translator project using Remill. What are you trying to accomplish where (not?) flagging this UB is significant? Are later optimization passes replacing the sdiv with undef? Also, where in that linked document is the underflow behavior specified?

Perhaps one thing to consider is that we try do the equivalent of a freezing operation, to mirror LLVM's freeze instruction. @2over12 thoughts?

@katrinafyi
Copy link
Author

Thanks for the reply, I'm not aware of any major problems this causes. We're just working towards validating lifters using the ARM specifications as a trustworthy source of truth. So there's no concrete application asides from the goal of being as accurate as possible.

In the ARM ISA, it is specified implicitly. Here, the integer type is an arbitrary precision two's complement integer. result<datasize-1:0> then extracts the lower 64 bits. The INT_MIN result is obtained from the lower 64 bits of the two's complement of the true numeric result +2^63.

@pgoodman
Copy link
Collaborator

Ah interesting! @regehr had previously done a bit of translation validation with remill (through anvill) and alive2 with opt-fuzz. I believe they eventually switched to their own custom AArch64 binary lifter/translator.

@lkorenc does a bit of translation validation-type stuff internally with a separate but related system. It might be worth a longer discussion on the #binary-lifting channel of the Empire Hacking Slack.

@regehr
Copy link

regehr commented Feb 21, 2023

hi this is interesting! as Peter says, we have a lifter from AArch64 to LLVM, but we're not trying to find bugs in the lifter, but rather using the lifter + Alive2 to find bugs in LLVM, you can see our current list here, these are all silent miscompiles:

https://github.com/regehr/alive2/blob/arm-tv/BugList.md#miscompilation-bugs-found-in-arm64-backend

I don't think we've done anything particularly interesting while lifting sdiv, but we definitely had to be careful when lifting things like shift exponents, and that Remill had to be similarly careful.

the thing I'm currently a bit stuck on with this project is the assembly-level memory model is sufficiently different from the LLVM memory model that we're going to need to rip out the Alive2 memory model and make it pluggable, so that it can use one set of rules for LLVM IR and another for lifted ARM code, which of course freely mixes integers and pointers in ways that make it pretty difficult to reason about.

@katrinafyi that's interesting to hear that you're using the ARM semantics. we're currently not, but I've spent a bunch of time getting executable code out of Sail and Isla (tools from Peter Sewell's group that make it easier to work with ARM's semantics). currently that's a bit on the back burner since my postdoc who was supporting this project had to leave his position, but I want to keep working on it.

@katrinafyi
Copy link
Author

Very cool! Our project is to derive a trustworthy lifter for ARMv8, based on Arm's machine-readable specifications. This is coming along nicely and it lifts a large subset of AArch64 to LLVM IR. We also use Alive2 for bidirectional translation validation to compare its results to other lifters, which is how we found these edge cases. Overall, we found Remill's semantics were actually very accurate; this and #656 being the only discrepancies of note.

@regehr That's an interesting project and an impressive list of compilation bugs. This is definitely a case where you'd need to trust the lifter you're using! We had looked at Sail for our work too but had trouble along the way; its definitions can be cumbersome and its effect type is difficult to translate and reason about.

Working with the ARM semantics directly is actually not too bad. The language is relatively simple and Alastair Reid has published tools for parsing and manipulating the ASL specification language, which we make use of in our lifter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants