New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Atomic instructions #556
Comments
The way I generally think about
|
Thank you for the feedback @pgoodman. I understand your idea better now. I'm agree with 1. and 2., however the effect of LLVM optimizer give somehow intuitive results. For example, when lifting this code mov rax, [rcx]
mov rbx, [rcx] remill-lift-12 --arch amd64 --name_register_variables --bc-out test.bc --bytes 488b01488b19 we receive the following LLVM code define dso_local %struct.Memory* @sub_0(%struct.State* noalias nonnull align 1 %state, i64 %pc, %struct.Memory* noalias %memory) local_unnamed_addr #0 {
%RBX_ptr = getelementptr inbounds %struct.State, %struct.State* %state, i64 0, i32 6, i32 3, i32 0, i32 0
%RCX_ptr = getelementptr inbounds %struct.State, %struct.State* %state, i64 0, i32 6, i32 5, i32 0, i32 0
%RAX_ptr = getelementptr inbounds %struct.State, %struct.State* %state, i64 0, i32 6, i32 1, i32 0, i32 0
%PC_ptr = getelementptr inbounds %struct.State, %struct.State* %state, i64 0, i32 6, i32 33, i32 0, i32 0
%RCX_ = load i64, i64* %RCX_ptr, align 8
%1 = tail call i64 @__remill_read_memory_64(%struct.Memory* %memory, i64 %RCX_) #3
store i64 %1, i64* %RAX_ptr, align 8
%2 = add i64 %pc, 6
store i64 %1, i64* %RBX_ptr, align 8
store i64 %2, i64* %PC_ptr, align 8
%3 = tail call %struct.Memory* @__remill_missing_block(%struct.State* %state, i64 %2, %struct.Memory* %memory)
ret %struct.Memory* %3
} This result is fine, two reads are optimized by a single read. Now we change the code a little bit: mov rax, [rcx]
mov [rcx], rax
mov rbx, [rcx]
we neutralize totally the optimizer: define dso_local %struct.Memory* @sub_0(%struct.State* noalias nonnull align 1 %state, i64 %pc, %struct.Memory* noalias %memory) local_unnamed_addr #0 {
%RBX_ptr = getelementptr inbounds %struct.State, %struct.State* %state, i64 0, i32 6, i32 3, i32 0, i32 0
%RCX_ptr = getelementptr inbounds %struct.State, %struct.State* %state, i64 0, i32 6, i32 5, i32 0, i32 0
%RAX_ptr = getelementptr inbounds %struct.State, %struct.State* %state, i64 0, i32 6, i32 1, i32 0, i32 0
%PC_ptr = getelementptr inbounds %struct.State, %struct.State* %state, i64 0, i32 6, i32 33, i32 0, i32 0
%RCX_ = load i64, i64* %RCX_ptr, align 8
%1 = tail call i64 @__remill_read_memory_64(%struct.Memory* %memory, i64 %RCX_) #3
store i64 %1, i64* %RAX_ptr, align 8
%2 = tail call %struct.Memory* @__remill_write_memory_64(%struct.Memory* %memory, i64 %RCX_, i64 %1) #3
%3 = add i64 %pc, 9
%4 = tail call i64 @__remill_read_memory_64(%struct.Memory* %2, i64 %RCX_) #3
store i64 %4, i64* %RBX_ptr, align 8
store i64 %3, i64* %PC_ptr, align 8
%5 = tail call %struct.Memory* @__remill_missing_block(%struct.State* %state, i64 %3, %struct.Memory* %2)
ret %struct.Memory* %5
} That is because the write operation has introduced a new memory then (as you've said, the optimizer cannot prove the input and the returned memory are the same), the second read cannot be eliminated. Or we will see always two write with mov [rcx], rbx
mov [rcx], rax Of course, we can say that this depends on some concrete implementation of memory read/write operations. But we may see something asymmetric here, with two reads, the optimizer works, but not with two writes (or a write between two reads). I think of whether remill may provide basic properties for memory read/write operations, something follows the axioms of the SMT theory of arrays. I don't know how to do that just with LLVM attributes, but a LLVM pass (like dead-store-elimination) may help. For 3., I admit that I was not think about the atomic of the execution of the instruction under interrupts or exceptions. Since under X86 or AMD64, the exception and interrupt handling always occurs at instruction boundary (i.e. an instruction is always executed completely or not executed at all), as cited from Intel's SDM Volume 3. Chapter 6.6:
I think that the atomic assurance may not in the scope of remill (instead in an emulator which uses remill, as you've said). Actually the main point of my issue is 4.. I initially wonder that how |
So merging two reads into one is technically sketchy, but it's hard to avoid, unless we introduce a new parameter to the read intrinsics, which would be doable but slightly annoying. Here is an example demonstrating why the merging is actually a bug. Usage of the memory pointer establishes a partial order, not a total order. Thus, a read of an earlier memory pointer can technically sink below a write. We have no way of stopping this at the moment. My point with the failure atomicity of instructions was that: we're already not faithful to the hardware in this regard, and so if stuff "leaks out" from an "atomic region" (bounded by the two intrinsics) then as long as they aren't memory accesses, then it's "fine." The intention of the atomic regions is to signal that "something needs mutual exclusion over memory." That is, think of it really as a critical section for memory operations, and that the flags computations or other register mutations that happen to be there are just that -- they happen to be there. State structures don't exist in real programs, our intuition for them is that they are truly private to this thread, and so we should think of one state structure as not being observable by another thread, even if the nature of the state structure pointer admits this possibility. |
Hello everyone,
When lifting some atomic instructions, e.g.
lock inc byte ptr [rax]
:I've got the following result:
where atomic intrinsic functions are declared as:
remill/include/remill/Arch/Runtime/Intrinsics.h
Lines 128 to 132 in 5955659
I'm not quite sure that I understand the lifted code correctly.
First, I think of atomic intrinsics as some kind of memory barriers (for
StoreLoad
ordering), but I still don't understand why such barriers make the execution of the instruction atomic.Second, if they are some kind of critical section acquire/release operations (to make the memory read/write exclusive), so their positions are somehow "too strict", e.g. the part of updating flags may not need belong to the critical section.
Indeed, we may think of two processors
P0
andP1
executes the functionsub_0
simultaneously, each has its ownstruct State* %state
(so%state0
forP0
,%state1
forP1
) but they share%struct Memory* %memory
. Then the only operations that need to be exclusive (or atomic) are__remill_read_memory_8
and__remill_write_memory_8
, there is no need to make the flag update synchronized (since the updates are on different%state0
and%state1
)Third, the attribute
readnone
seem too "relaxed" for these functions (may the optimizer change their positions?), should they beinaccessiblememonly
?Thank you in advance for any response.
PS. I think of another approach is to propose some kind of
__remill_atomic_read/write_xxx
(instead of__remill_read/write_xxx
intrinsics, then no need of_remill_atomic_begin/end
) for atomic instructions.The text was updated successfully, but these errors were encountered: