Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

heap corruption caused by tst_type_setvalue for MPI_TYPE_MIX_LB_UB #8

Open
BenWibking opened this issue Feb 12, 2023 · 3 comments · May be fixed by #11
Open

heap corruption caused by tst_type_setvalue for MPI_TYPE_MIX_LB_UB #8

BenWibking opened this issue Feb 12, 2023 · 3 comments · May be fixed by #11

Comments

@BenWibking
Copy link

I get an out-of-bounds write detected when running (built with Clang's AddressSanitizer against OpenMPI 4.1.4):

=================================================================
==62169==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x000106d3894f at pc 0x00010455f60c bp 0x00016ba0f850 sp 0x00016ba0f848
WRITE of size 1 at 0x000106d3894f thread T0
    #0 0x10455f608 in tst_type_setvalue tst_types.c:984
    #1 0x1045600d8 in tst_type_setstandardarray tst_types.c:1012
    #2 0x104510810 in tst_p2p_simple_ring_init tst_p2p_simple_ring.c:39
    #3 0x10454dd0c in tst_test_init_func tst_tests.c:1453
    #4 0x1044b91d8 in main mpi_test_suite.c:455
    #5 0x1a38fbe4c  (<unknown module>)

0x000106d3894f is located 1 bytes to the left of 1-byte region [0x000106d38950,0x000106d38951)
allocated by thread T0 here:
    #0 0x104ca2ca8 in wrap_malloc+0x94 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x3eca8)
    #1 0x104557f6c in tst_type_allocvalues tst_types.c:563
    #2 0x1045103a8 in tst_p2p_simple_ring_init tst_p2p_simple_ring.c:30
    #3 0x10454dd0c in tst_test_init_func tst_tests.c:1453
    #4 0x1044b91d8 in main mpi_test_suite.c:455
    #5 0x1a38fbe4c  (<unknown module>)

SUMMARY: AddressSanitizer: heap-buffer-overflow tst_types.c:984 in tst_type_setvalue

Full log: mpi_test_suite_heap_corruption.txt

@BenWibking
Copy link
Author

I can also reproduce this on Fedora 37 with gcc 12.2.1:

P2P tests Ring (5/101), comm MPI_COMM_WORLD (1/9), type MPI_TYPE_MIX_LB_UB (29/29)
=================================================================
==124097==ERROR: AddressSanitizer: heap-buffer-overflow on address 0xffff8000856f at pc 0x00000052d7a8 bp 0xffffc99ba540 sp 0xffffc99ba558
WRITE of size 1 at 0xffff8000856f thread T0
    #0 0x52d7a4 in tst_type_setvalue /home/benwibking.linux/mpi-test-suite/tst_types.c:984
    #1 0x52ddd8 in tst_type_setstandardarray /home/benwibking.linux/mpi-test-suite/tst_types.c:1012
    #2 0x4ec5d8 in tst_p2p_simple_ring_init p2p/tst_p2p_simple_ring.c:39
    #3 0x51d35c in tst_test_init_func /home/benwibking.linux/mpi-test-suite/tst_tests.c:1453
    #4 0x4a6e5c in main /home/benwibking.linux/mpi-test-suite/mpi_test_suite.c:455
    #5 0xffff8541b584 in __libc_start_call_main (/lib64/libc.so.6+0x2b584)
    #6 0xffff8541b65c in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x2b65c)
    #7 0x4055ac in _start (/home/benwibking.linux/mpi-test-suite/mpi_test_suite+0x4055ac)

0xffff8000856f is located 1 bytes to the left of 1-byte region [0xffff80008570,0xffff80008571)
allocated by thread T0 here:
    #0 0xffff85d8e500 in malloc (/lib64/libasan.so.8+0xae500)
    #1 0x526a60 in tst_type_allocvalues /home/benwibking.linux/mpi-test-suite/tst_types.c:563
    #2 0x4ec218 in tst_p2p_simple_ring_init p2p/tst_p2p_simple_ring.c:30
    #3 0x51d35c in tst_test_init_func /home/benwibking.linux/mpi-test-suite/tst_tests.c:1453
    #4 0x4a6e5c in main /home/benwibking.linux/mpi-test-suite/mpi_test_suite.c:455
    #5 0xffff8541b584 in __libc_start_call_main (/lib64/libc.so.6+0x2b584)
    #6 0xffff8541b65c in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x2b65c)
    #7 0x4055ac in _start (/home/benwibking.linux/mpi-test-suite/mpi_test_suite+0x4055ac)

SUMMARY: AddressSanitizer: heap-buffer-overflow /home/benwibking.linux/mpi-test-suite/tst_types.c:984 in tst_type_setvalue
Shadow bytes around the buggy address:
  0x200ff0001050: fa fa 04 fa fa fa fd fa fa fa fd fa fa fa 04 fa
  0x200ff0001060: fa fa fd fa fa fa fd fa fa fa 04 fa fa fa fd fa
  0x200ff0001070: fa fa fd fa fa fa fd fa fa fa fa fa fa fa fa fa
  0x200ff0001080: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x200ff0001090: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x200ff00010a0: fa fa fa fa fa fa fa fa fa fa 01 fa fa[fa]01 fa
  0x200ff00010b0: fa fa 01 fa fa fa 01 fa fa fa 01 fa fa fa 01 fa
  0x200ff00010c0: fa fa 01 fa fa fa 01 fa fa fa 01 fa fa fa 01 fa
  0x200ff00010d0: fa fa 01 fa fa fa fd fa fa fa fd fa fa fa fd fa
  0x200ff00010e0: fa fa fd fa fa fa 04 fa fa fa fd fa fa fa fd fa
  0x200ff00010f0: fa fa fd fa fa fa fd fa fa fa fd fa fa fa fd fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==124097==ABORTING

There is a bug in tst_type_setvalue for MPI_TYPE_MIX_LB_UB.

@BenWibking BenWibking changed the title ERROR: AddressSanitizer: heap-buffer-overflow WRITE of size 1 heap corruption caused by bug in tst_type_setvalue for MPI_TYPE_MIX_LB_UB Feb 13, 2023
@BenWibking BenWibking changed the title heap corruption caused by bug in tst_type_setvalue for MPI_TYPE_MIX_LB_UB heap corruption caused by tst_type_setvalue for MPI_TYPE_MIX_LB_UB Feb 13, 2023
@BenWibking BenWibking linked a pull request Feb 13, 2023 that will close this issue
@hominhquan
Copy link

hominhquan commented Feb 28, 2024

I ran into this problem, too, on AWS Graviton3 with ACFL 23.10 and OMPI branch v5.0.x.

My observation from gdb and code is that there is problem with types whose ub == lb == 0 in the types[] table (

static struct type types[32] = {
), which will cause malloc(zero), and then some tst_type_setvalue() functions, called by loops in the testssuite, come and write data to these zero-malloc-ed pointers as if they were valid addresses.

This issue may be the same as issue #7 as well.

@BenWibking I applied your patch #11 but the segfault still occurs (maybe somewhere after).

@bosilca
Copy link
Member

bosilca commented Feb 28, 2024

The MPI_TYPE_MIX_LB_UB does not have the lb or ub set to zero. According to OMPI datatype description we are looking at:

Datatype 0x137fcb0[] id -1 size 27 align 8 opal_id 0 length 7 used 6
true_lb -17 true_ub 10 (true_extent 27) lb -17 ub 10 (extent 27)
nbElems 6 loops 0 flags 11C4 (committed )-c--lu-GDH-[---][INT]
contain lb ub OPAL_LB:* OPAL_UB:* OPAL_INT1:* OPAL_INT2:* OPAL_INT4:* OPAL_FLOAT4:* OPAL_FLOAT8:* OPAL_LONG:*
--C---P-D--[---][---] OPAL_INT1 count 1 disp 0xffffffffffffffff (-1) blen 1 extent 1 (size 1)
--C---P-DH-[---][---] OPAL_INT2 count 1 disp 0x0 (0) blen 1 extent 2 (size 2)
--C---P-DH-[---][---] OPAL_INT4 count 1 disp 0x2 (2) blen 1 extent 4 (size 4)
--C---P-DH-[---][---] OPAL_LONG count 1 disp 0xfffffffffffffff7 (-9) blen 1 extent 8 (size 8)
--C---P-D--[---][---] OPAL_FLOAT4 count 1 disp 0x6 (6) blen 1 extent 4 (size 4)
--C---P-D--[---][---] OPAL_FLOAT8 count 1 disp 0xffffffffffffffef (-17) blen 1 extent 8 (size 8)
-------G---[---][---] OPAL_LOOP_E prev 6 elements first elem displacement -1 size of data 27

type 11 count ints 9 count disp 8 count datatype 8
ints: 8 1 1 1 1 1 1 1 1
MPI_Aint: -17 -1 0 2 -9 6 -17 10
types: MPI_LB MPI_CHAR MPI_SHORT MPI_INT MPI_LONG MPI_FLOAT MPI_DOUBLE MPI_UB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants