Bad perf with memory refinement query #983

regehr · 2023-12-03T03:21:55Z

here's an interesting test case that @tanmaytirpankar found:

define void @insert_store_nonconst_index_not_known_valid_by_and(ptr %0, i8 zeroext %1, i32 %2) {
 %4 = load <16 x i8>, ptr %0, align 16
 %5 = and i32 %2, 16
 %6 = insertelement <16 x i8> %4, i8 %1, i32 %5
 store <16 x i8> %6, ptr %0, align 16
 ret void
}

it exposes two issues. first, if the insertelement index is OOB, the result is a vector full of poison. however, on the ARM side, we get something like:

define void @insert_store_nonconst_index_not_known_valid_by_and(ptr nocapture writeonly %0, i8 %1, i32 %2) local_unnamed_addr #0 {
arm_tv_entry:
  %a2_2 = and i32 %2, 16
  %a2_3 = zext nneg i32 %a2_2 to i64
  %3 = getelementptr i8, ptr %0, i64 %a2_3
  store i8 %1, ptr %3, align 1
  ret void
}

of course at this level, an OOB store is UB. so we don't have a refinement, and moreover, I don't know that we can easily fix this in the general case, as we can for other target-side UB like divide by zero and shift-past-bitwidth.

but that's not all! if we force the index to be in bounds like this:

define void @insert_store_nonconst_index_not_known_valid_by_and(ptr %0, i8 zeroext %1, i32 %2) {
 %4 = load <16 x i8>, ptr %0, align 16
 %5 = and i32 %2, 15
 %6 = insertelement <16 x i8> %4, i8 %1, i32 %5
 store <16 x i8> %6, ptr %0, align 16
 ret void
}

then of course we don't have any problems with UB, but we still can't prove refinement. here's a src-tgt pair that doesn't seem too difficult, but it runs for 99 minutes and then I get "ERROR: SMT Error: smt tactic failed to show goal to be sat/unsat memout":

define void @src(ptr %0, i8 zeroext %1, i32 %2) {
 %x4 = load <16 x i8>, ptr %0, align 1
 %z = and i32 %2, 15
 %x6 = insertelement <16 x i8> %x4, i8 %1, i32 %z
 store <16 x i8> %x6, ptr %0, align 1
 ret void
}

define void @tgt(ptr %0, i8 %1, i32 %2) {
arm_tv_entry:
   %z = and i32 %2, 15
   %3 = getelementptr i8, ptr %0, i32 %z
  store i8 %1, ptr %3, align 1
  ret void
}

regehr@ohm:~/alive2-regehr/build$ time ./alive-tv foo2.ll --disable-poison-input --disable-undef-input --smt-to=10000000

----------------------------------------
define void @src(ptr %#0, i8 zeroext %#1, i32 %#2) {
#3:
  %x4 = load <16 x i8>, ptr %#0, align 1
  %z = and i32 %#2, 15
  %x6 = insertelement <16 x i8> %x4, i8 zeroext %#1, i32 %z
  store <16 x i8> %x6, ptr %#0, align 1
  ret void
}
=>
define void @tgt(ptr %#0, i8 %#1, i32 %#2) {
arm_tv_entry:
  %z = and i32 %#2, 15
  %#3 = gep ptr %#0, 1 x i32 %z
  store i8 %#1, ptr %#3, align 1
  ret void
}

ERROR: SMT Error: smt tactic failed to show goal to be sat/unsat memout

real	99m30.728s
user	98m27.443s
sys	0m42.087s
regehr@ohm:~/alive2-regehr/build$

The text was updated successfully, but these errors were encountered:

nunoplopes · 2023-12-03T18:28:50Z

LLVM is buggy. The generated assembly is:

src:
        and     w8, w2, #0x10
        strb    w1, [x0, w8, uxtw]
        ret

It may store a byte OOB where the original code did not.

regehr · 2023-12-03T19:02:19Z

ah.... thanks Nuno, we missed that. they're not going to be happy about this since fixing it will add instructions, unless the fix is to make OOB InsertElement instructions UB.

I'll leave this issue open for the performance problem, but of course close if you want

nunoplopes · 2023-12-04T10:37:59Z

we need to improve perf for this:

define void @src(ptr %0, i8 zeroext %1, i32 %2) {
 %4 = load <16 x i8>, ptr %0, align 16
 %5 = and i32 %2, 15
 %6 = insertelement <16 x i8> %4, i8 %1, i32 %5
 store <16 x i8> %6, ptr %0, align 16
 ret void
}

define void @tgt(ptr nocapture writeonly %0, i8 %1, i32 %2) {
  %a2_2 = and i32 %2, 15
  %a2_3 = zext nneg i32 %a2_2 to i64
  %x3 = getelementptr i8, ptr %0, i64 %a2_3
  store i8 %1, ptr %x3, align 1
  ret void
}

Essentially the refinement query uses a forall-quantified idx to go through the memories. It should be unrolled in small (all?) cases.

nunoplopes changed the title ~~two issues exposed by one arm-tv test case~~ Bad perf with memory refinement query Dec 4, 2023

nunoplopes added performance Opportunities for improving performance memory Memory Model labels Dec 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad perf with memory refinement query #983

Bad perf with memory refinement query #983

regehr commented Dec 3, 2023 •

edited

nunoplopes commented Dec 3, 2023

regehr commented Dec 3, 2023

nunoplopes commented Dec 4, 2023

Bad perf with memory refinement query #983

Bad perf with memory refinement query #983

Comments

regehr commented Dec 3, 2023 • edited

nunoplopes commented Dec 3, 2023

regehr commented Dec 3, 2023

nunoplopes commented Dec 4, 2023

regehr commented Dec 3, 2023 •

edited