Skip to content

Commit

Permalink
prov/xpmem: Fix xpmem memory corruption
Browse files Browse the repository at this point in the history
The offset into the XPMEM memory to be copied is calculated on the
receiving end.

XPMEM code calculates the memory to map to in the following manner:

memory_region_start_address = ofi_get_page_start(original_address)
memory_region_length = ofi_get_page_end(original_address+original_length)

It then caches that memory region and attaches to it if necessary. The
offset is calculated as follows:

offset = original_address - memory_region_start_address

The cache search is provided the following parameters:
- memory_region_start_address
- memory_region_length

After getting the mapped address, the copy is done from the mapped
address plus the offset.

When searching the cache a cache hit is found if the region being
searched is within a memory region which has already been cached.
However, this exposes a bug in XPMEM. Here is an example to
illustrate the issue:

address being looked up: 0x7fffc8463000
Length: 0x5FFF
Ending address: 0x7FFFC8468FFF
offset: 0x1000

Address cached: 0x7fffc8462000
Lengh: 0x6FFF
Ending Address: 0x7FFFC8468FFF

As shown in the example the memory region being looked up in the cache is
within the cached memory region.

However, if the cached memory region address is returned and subsequently the
calculated offset is used to copy the data, there will be a discrepancy
of a page, leading to memory corruption. In the above example the copy
will start from 0x7fffc8462000 + 0x1000 instead of from
0x7fffc8463000 + 0x1000

To resolve the issue, calculate the delta between the remote address
which is returned in the cache result and the remote address used in the
operation. Add that delta to the offset to calculate the starting
address of the memory operation.

Signed-off-by: Amir Shehata <shehataa@ornl.gov>
  • Loading branch information
amirshehataornl authored and j-xiong committed Mar 18, 2024
1 parent 31a93e2 commit 16385db
Showing 1 changed file with 7 additions and 4 deletions.
11 changes: 7 additions & 4 deletions src/xpmem.c
Expand Up @@ -164,7 +164,7 @@ int xpmem_copy(struct iovec *local, unsigned long local_cnt,
{
int ret, i;
struct iovec iov;
uint64_t offset, copy_len;
uint64_t offset, copy_len, delta;
void *mapped_addr;
struct ofi_mr_entry *mr_entry;
long page_size = ofi_get_page_size();
Expand All @@ -178,16 +178,19 @@ int xpmem_copy(struct iovec *local, unsigned long local_cnt,
(uintptr_t) ofi_get_page_end(remote[i].iov_base +
remote[i].iov_len, page_size) -
(uintptr_t)iov.iov_base;
offset = (uintptr_t)((uintptr_t) remote[i].iov_base -
(uintptr_t) iov.iov_base);

ret = ofi_xpmem_cache_search(xpmem_cache, &iov, pid, &mr_entry,
(struct xpmem_client *)user_data);
if (ret)
return ret;

delta = (uintptr_t) iov.iov_base -
(uintptr_t) mr_entry->info.iov.iov_base;
offset = (uintptr_t)((uintptr_t) remote[i].iov_base -
(uintptr_t) iov.iov_base);

mapped_addr = (char*) (uintptr_t)mr_entry->info.mapped_addr +
offset;
offset + delta;

copy_len = (local[i].iov_len <= iov.iov_len - offset) ?
local[i].iov_len : iov.iov_len - offset;
Expand Down

0 comments on commit 16385db

Please sign in to comment.