ptrmask(p, 0) -> null considered valid refinement #929

nikic · 2023-07-31T08:45:15Z

https://alive2.llvm.org/ce/z/UhTwMb

declare ptr @llvm.ptrmask.p0.i64(ptr, i64)
define ptr @src(ptr %p) {
  %r = call ptr @llvm.ptrmask.p0.i64(ptr %p, i64 0)
  ret ptr %r
}
define ptr @tgt(ptr %p) {
  ret ptr null
}

I would have expected this to not verify, because provenance changes from %p to null.

The text was updated successfully, but these errors were encountered:

nunoplopes · 2023-07-31T10:26:47Z

This is a complicated story, I'm afraid.. null is used a bit inconsistently across optimizations, and so I don't know the exact definition. Right now, Alive2 interprets null as a ptr whose address is 0.
I agree with you that accepting this refinement is not ideal, since if you do a gep inbounds on this thing later, it won't work.

I need to re-test what's the blast radius if we change the definition of null.
I'm leaving for vacations today, though. I'll check this once I'm back.

nunoplopes · 2023-08-19T14:54:46Z

I'm back!

So, if we change the definition of null to be a pointer to block with id=0 instead of any pointer whose address is 0x0, a few optimizations become wrong:

; just because it's not null, it could be an OOB pointer whose address is 0x0
define i1 @src(ptr nonnull %p) {
  %a = ptrtoint ptr %p to i64
  %c = icmp eq i64 %a, 0
  ret i1 %c
}

define i1 @tgt(ptr nonnull %p) {
  ret i1 false
}

; we've defined pointer comparison as comparison of addresses, so same reasoning here
; an OOB ptr can have addr 0x0
define i1 @src(ptr %p) {
  %a = load ptr, ptr %p, align 1, !nonnull !{}
  %c = icmp eq ptr %a, null
  ret i1 %c
}

define i1 @tgt(ptr %p) {
  ret i1 false
}

; same as before: p = q <=> (ptr2int p) = (ptr2int q)
define ptr @src(ptr %in) {
  %c = icmp eq ptr %in, null
  %r = select i1 %c, ptr null, ptr %in
  ret ptr %r
}

define ptr @tgt(ptr %in) {
  ret ptr %in
}

nikic · 2023-08-19T14:59:04Z

So, if we change the definition of null to be a pointer to block with id=0 instead of any pointer whose address is 0x0

I don't understand what this means. A null pointer should still have address 0x0, the question is just what provenance it has, right?

nunoplopes · 2023-08-19T15:00:21Z

Defining ptr comparison like that is useful for, e.g., optimizations that introduce run-time aliasing checks using geps that can go OOB (like the loop vectorizer).

Our thoughts back in the day to try to make LLVM consistent were to define null = int2ptr 0. That would make Nikita's optimization correct, while still allowing a gep to move the pointer back to an inbounds ptr.
But that means that null may alias with any escaped pointer. And that even ptrmask would escape input pointers.

It's a tough game to play. I would prefer if ptr comparison didn't compare addresses, but object id & offset. But then we would need some way for loop vectorization to check for aliasing of potentially OOB pointers.

TL;DR: I don't know what to do here. Need help on the LLVM side.

nunoplopes · 2023-08-19T15:01:28Z

So, if we change the definition of null to be a pointer to block with id=0 instead of any pointer whose address is 0x0

I don't understand what this means. A null pointer should still have address 0x0, the question is just what provenance it has, right?

Right. A null pointer would just have provenance of block 0 (the "null block").

nikic · 2023-08-19T15:03:44Z

So, if we change the definition of null to be a pointer to block with id=0 instead of any pointer whose address is 0x0

I don't understand what this means. A null pointer should still have address 0x0, the question is just what provenance it has, right?

Right. A null pointer would just have provenance of block 0 (the "null block").

Okay, but then I don't get why your examples are no longer correct. I guess that depends on how you define nonnull. I would expect it to be defined as ptrtoint(%p) != 0, not icmp_with_provenance ne %p, null (which is not even an operation we have). In that case these examples should remain correct by definition.

nunoplopes · 2023-08-19T16:12:54Z

So, if we change the definition of null to be a pointer to block with id=0 instead of any pointer whose address is 0x0

I don't understand what this means. A null pointer should still have address 0x0, the question is just what provenance it has, right?

Right. A null pointer would just have provenance of block 0 (the "null block").

Okay, but then I don't get why your examples are no longer correct. I guess that depends on how you define nonnull. I would expect it to be defined as ptrtoint(%p) != 0, not icmp_with_provenance ne %p, null (which is not even an operation we have). In that case these examples should remain correct by definition.

True, I just checked it.
But the last example still fails:

define ptr @src(ptr %in) {
  %c = icmp eq ptr %in, null
  %r = select i1 %c, ptr null, ptr %in
  ret ptr %r
}

define ptr @tgt(ptr %in) {
  ret ptr %in
}

nikic · 2023-08-19T16:28:55Z

True, I just checked it. But the last example still fails:

define ptr @src(ptr %in) {
  %c = icmp eq ptr %in, null
  %r = select i1 %c, ptr null, ptr %in
  ret ptr %r
}

define ptr @tgt(ptr %in) {
  ret ptr %in
}

This one is less straightforward. I would argue this transform is valid, because the provenance of %in must be greater or equal to that of null in terms of accesses it allows.

This probably depends on details of how exactly this is represented in alive. Basically the statement would be that it's okay to refine the "null block" to any other block.

nunoplopes · 2023-08-19T17:09:07Z

Actually LLVM does transformations in both directions:

(ptr)0 -> null

define ptr @f() {
  %p = inttoptr i64 0 to ptr
  ret ptr %p
}
=>
define ptr @f() {
  ret ptr null
}

(https://llvm.godbolt.org/z/3KdWd1EvP)

null -> (ptr)0
the select+icmp transformation

If we believe both transformations are correct, then null == (ptr)0.
And thus we must accept that getelemenptr i8, ptr null, i64 %foo may alias with anything. Which we don't.

I think this reasoning shows a conflict in LLVM's behavior.
I'm not sure what's the best way forward. I could see us accepting null -> (ptr)0 as a valid refinement (worst case you add provenance to the pointer, so it's ok). But I think we should remove the inttoptr 0 -> null transformation, as it potentially removes provenance.

nikic · 2023-08-19T17:18:21Z

I'm not sure what's the best way forward. I could see us accepting null -> (ptr)0 as a valid refinement (worst case you add provenance to the pointer, so it's ok). But I think we should remove the inttoptr 0 -> null transformation, as it potentially removes provenance.

I agree that this is the correct thing to do.

The way this is currently handled is that we try to not produce inttoptr(0) from non-trivial expressions. E.g. we used to fold things like inttoptr(gep p, -p) to inttoptr(0), but no longer do this to avoid it later getting folded to null. This helps avoid miscompiles in practice, but the principled thing to do would be to remove the inttoptr(0) to null fold.

Actually removing it will probably be fairly tricky though.

nikic · 2023-08-19T19:54:57Z

One of the tricky bits: If you do a memset(0), do you get inttoptr(0) or null? I guess the correct answer is inttoptr(0), but that's not the answer we actually want to hear...

nunoplopes · 2023-08-20T08:14:54Z

One of the tricky bits: If you do a memset(0), do you get inttoptr(0) or null? I guess the correct answer is inttoptr(0), but that's not the answer we actually want to hear...

True. Now I remember that was the reason why we defined null == inttoptr(0) in Alive.
So right now, Alive accepts store null <-> memset(0).

But well, how often is that you load a null pointer from memory, and that load is store-forwarded from a memset? Also, it only matters for geps. For icmp, null is equivalent, and for memory-access functions it's UB to dereference (int)0.

So it's really only for geps that we would need to keep inttoptr around.

nunoplopes added the memory Memory Model label Jul 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ptrmask(p, 0) -> null considered valid refinement #929

ptrmask(p, 0) -> null considered valid refinement #929

nikic commented Jul 31, 2023

nunoplopes commented Jul 31, 2023

nunoplopes commented Aug 19, 2023

nikic commented Aug 19, 2023

nunoplopes commented Aug 19, 2023

nunoplopes commented Aug 19, 2023

nikic commented Aug 19, 2023

nunoplopes commented Aug 19, 2023

nikic commented Aug 19, 2023

nunoplopes commented Aug 19, 2023

nikic commented Aug 19, 2023

nikic commented Aug 19, 2023

nunoplopes commented Aug 20, 2023

ptrmask(p, 0) -> null considered valid refinement #929

ptrmask(p, 0) -> null considered valid refinement #929

Comments

nikic commented Jul 31, 2023

nunoplopes commented Jul 31, 2023

nunoplopes commented Aug 19, 2023

nikic commented Aug 19, 2023

nunoplopes commented Aug 19, 2023

nunoplopes commented Aug 19, 2023

nikic commented Aug 19, 2023

nunoplopes commented Aug 19, 2023

nikic commented Aug 19, 2023

nunoplopes commented Aug 19, 2023

nikic commented Aug 19, 2023

nikic commented Aug 19, 2023

nunoplopes commented Aug 20, 2023