Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bounds inference for uint -> int casts #7814

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Conversation

abadams
Copy link
Member

@abadams abadams commented Aug 28, 2023

Fixes #7807
Fixes #7810

Note that this may cause previously-functioning pipelines to throw a compile-time error, in cases where they cast a uint32 to an int32 for use in an index expression and were relying on Halide to assume the result is positive. Now that the uint32 -> int32 cast is defined to wrap, the result may not be positive. An example might be the expression:

lut(min(cast<int>(some_u32_param), 256))

Previously Halide would have treated this as bounded between 0 and 256. Now it's no longer bounded below, because the uint32 could have been large enough to wrap.

If this causes too much carnage in production pipelines we may have to revert the decision to define wrapping behavior for uint32 -> int32 casts.

@steven-johnson
Copy link
Contributor

Testing this in Google, I get many many failures of the form Error: Buffer foo may be accessed in an unbounded way in dimension 0

@abadams abadams added the release_notes For changes that may warrant a note in README for official releases. label Aug 29, 2023
@abadams
Copy link
Member Author

abadams commented Aug 29, 2023

The question now is whether to roll back us defining uint -> int casts as having wrapping semantics, or to patch those pipelines. I'd like to slowly reduce the number of types of UB we have over time, so if patching seems tractable (e.g. by inserting a max with zero around the index), that would be best.

@steven-johnson
Copy link
Contributor

I'll have to figure out how many pipelines it is. (there are lots of failures, but maybe it's just a handful of shared pipelines.) What is the likely failure pattern I'm looking for?

@abadams
Copy link
Member Author

abadams commented Aug 29, 2023

Access to an input named "foo" using a computed uint32 index. This is a bit of a guess but I think the needed patch is something like:

Before:

foo(cast<int>(some_u32_expr), ...)

After:

foo(saturating_cast<int>(some_u32_expr), ...)

@steven-johnson
Copy link
Contributor

Access to an input named "foo" using a computed uint32 index. This is a bit of a guess but I think the needed patch is something like:

The first case I'm looking at doesn't have anything obviously of that form, but there are a lot of helper functions. I wonder if there is a bottleneck for trapping most/all int->uint cases that would help?

@steven-johnson
Copy link
Contributor

Running the failure case with HL_DEBUG_CODEGEN=4 reveals something possibly interesting: just before the failure, I see

simplify let t749 = (int16)1023 in...
new let t749.s = (undefined) in ... (int16)1023 ...

undefined?

@abadams
Copy link
Member Author

abadams commented Aug 29, 2023

That undefined is fine. It means the simplifier has determined the new let is not necessary to insert.

@steven-johnson
Copy link
Contributor

The immediate failure looks like this:

Input to CSE ((uint16)2048 == (uint16)2048)
After removing lets: ((uint16)2048 == (uint16)2048)
Include: ((uint16)2048 == (uint16)2048); should extract: 1
Include: (uint16)2048; should extract: 0
Include: (uint16)2048; should extract: 0
Canonical form without lets ((uint16)2048 == (uint16)2048)
0: (uint16)2048, 0
1: ((uint16)2048 == (uint16)2048), 1
With variables ((uint16)2048 == (uint16)2048)
With lets: ((uint16)2048 == (uint16)2048)
In image input region touched is:
  0: max(min(((0*16) + input.min.0) + 0, (input.extent.0 + input.min.0) + -1), input.min.0) .. max(min(((((input.extent.0 + -1)/16)*16) + input.min.0) + 15, (input.extent.0 + input.min.0) + -1), input.min.0)
  1: max(min(((0*4) + input.min.1) + 0, (input.extent.1 + input.min.1) + -1), input.min.1) .. max(min(((((input.extent.1 + -1)/4)*4) + input.min.1) + 3, (input.extent.1 + input.min.1) + -1), input.min.1)
Injecting constraints for input.0
Injecting constraints for input.1
In image original_input region touched is:
  0: (void *)neg_inf .. (void *)pos_inf
  1: (void *)neg_inf .. (void *)pos_inf
User error triggered at third_party/halide/halide/src/AddImageChecks.cpp:383

@abadams
Copy link
Member Author

abadams commented Aug 29, 2023

I pushed a temporary change that should give a more useful error

@steven-johnson
Copy link
Contributor

The failure is coming from INSIDE HALIDE! </>

Look at mirror_interior() and tell me where the uint32->int32 cast is (implicitly) creeping in:

            Expr limit = extent - 1;
            Expr coord = arg_var - min;   // Enforce zero origin.
            coord = coord % (2 * limit);  // Range is 0 to 2w-1
            coord = coord - limit;        // Range is -w, w
            coord = abs(coord);           // Range is 0, w
            coord = limit - coord;        // Range is 0, w
            coord = coord + min;          // Restore correct min

            // The boundary condition probably doesn't apply
            coord = select(arg_var < min || arg_var >= min + extent, coord,
                           clamp(likely(arg_var), min, min + extent - 1));

(Spoiler alert: abs() in Halide always returns an unsigned.)

(In hindsight, I really wish that Halide had adopted the Golang property of never doing implicit type conversions...)

@steven-johnson
Copy link
Contributor

Unfortunately, if I change the bad line to coord = saturating_cast<int>(abs(coord));, we still fail with the same result (input is touched in an infinite way), but none of the "dodgy" check warnings you inserted trigger this time (they did previously, which is how I found the mirror issue). I'm gonna guess there is more abs() nonsense going on.

@abadams
Copy link
Member Author

abadams commented Aug 29, 2023

Can repro! I'll investigate.

@abadams
Copy link
Member Author

abadams commented Aug 29, 2023

Possible fix pushed

@steven-johnson
Copy link
Contributor

OK, that fixes the unbounded access... now I get failures with Error: Signed integer overflow occurred during constant-folding. Signed integer overflow for int32 and int64 is undefined behavior in Halide. Let me investigate that.

@steven-johnson
Copy link
Contributor

Yeah, this is another one where it is gonna be tricky to find these errors in existing code without some helpers.

@abadams
Copy link
Member Author

abadams commented Aug 29, 2023

I pushed a user_warning for when bounds inference does something different to what it used to do.

@steven-johnson
Copy link
Contributor

The first one I found basically amounted to code that used explicit casting to do a 'widening' mul (this code predates the widening_mul() etc operators:

    // f and h are both funcs with type i16
    Expr r = i32(f(x, y, 0));
    Expr g = i32(f(x, y, 1));
    Expr b = i32(f(x, y, 2));
    Expr result = h(0, c) * r + h(1, c) * g + h(2, c) * b;

switching it to use widening_mul seems to heal it (not sure if result is correct), but many others to track down. Will update later.

@abadams
Copy link
Member Author

abadams commented Aug 29, 2023

I'm a little alarmed that widening_mul heals that, because any 16-bit -> i32 cast should be bounded, because i32 can represent i16 and u16. Also, widening_mul just means the same thing.

@steven-johnson
Copy link
Contributor

Also, widening_mul just means the same thing.

The evidence suggests otherwise? Not sure I can give you an easy repro case from it but maybe using that snippet and seeing how the calculated bounds differ would be useful.

@steven-johnson
Copy link
Contributor

I pushed a user_warning for when bounds inference does something different to what it used to do.

I'll apply that and retry.

@steven-johnson
Copy link
Contributor

I'm a little alarmed that widening_mul heals that, because any 16-bit -> i32 cast should be bounded, because i32 can represent i16 and u16. Also, widening_mul just means the same thing.

Update: maybe that's because the Call visitor in Bounds calculation doesn't have a case for widening_mul, so it just defaults to max-of-type for an unknown Func, which in this case is '+/- inf'.

If I add a specialization that is similar to widening_mul_right, eg

        } else if (op->is_intrinsic(Call::widening_mul)) {
            Expr mul = Mul::make(cast(op->type, op->args[0]), cast(op->type, op->args[1]));
            mul.accept(this);
        }

..then using widening_mul also fails with signed-integer-overflow.

@steven-johnson
Copy link
Contributor

Also interesting (though maybe not 100% relevant here): we do specialize for shift_left, but only for signed integers, not for unsigned... in the case of this code, we have a u32 << u32 case where LHS is in 0..65535 and RHS is in 0..16, so bounds should be 0..0xFFFF0000... but since we don't handle it, bounds is neg_inf..pos_inf

steven-johnson added a commit that referenced this pull request Aug 29, 2023
This addresses many (but not all) of the `signed integer overflow` issues we're seeing in Google due to #7814 -- a lot of the issues seems to be in code that uses intrinsics that had no handling in value bounds checking, so the bounds were naively large and overflowed.

- Most of the intrinsics from FindIntrinsics.h weren't handled; now they all are (most by lowering to other IR, though the halving_add variants were modeled directly because the bitwise ops don't mesh well)
- strict_float() is just a pass-through
- round() is a best guess (basically, if bounds exist, expand by one as a worst-case)

There are definitely others we should handle here... trunc/floor/ceil probably?
@steven-johnson
Copy link
Contributor

This is fairly old, where does it stand?

@abadams
Copy link
Member Author

abadams commented Nov 28, 2023

This one is pending figuring out what breaks inside of google

@steven-johnson
Copy link
Contributor

This one is pending figuring out what breaks inside of google

It's been a while, do we remember the gist of it

@abadams
Copy link
Member Author

abadams commented Nov 28, 2023

I think we were knocking cases down one by one, but there are more failures that we have yet to diagnose.

@steven-johnson
Copy link
Contributor

I think we were knocking cases down one by one, but there are more failures that we have yet to diagnose.

ok, I will pull it in and see where we stand

@steven-johnson
Copy link
Contributor

So the fact that abs() has always returns a uint means that this change will likely mean that ~every call to it will need attention (e.g., something like abs(y) -> max(abs(y), 0) to ensure no negative values).

@abadams
Copy link
Member Author

abadams commented Nov 29, 2023

I see, so people are doing things like my_func(abs(expr)), and that's implicitly my_func(i32(abs(expr))), and the expr could be the most-negative integer if we know nothing about it, in which case that cast now returns the most-negative integer instead of being UB.

@abadams
Copy link
Member Author

abadams commented Nov 29, 2023

Wait, that can't be the example, because if we know nothing about expr in that case the access is unbounded anyway. I guess there are cases where we know an upper bound on expr but not a lower bound?

@steven-johnson
Copy link
Contributor

I see, so people are doing things like my_func(abs(expr)), and that's implicitly my_func(i32(abs(expr))), and the expr could be the most-negative integer if we know nothing about it, in which case that cast now returns the most-negative integer instead of being UB.

yeah, this is pervasive too, it makes me question whether this is the right design for abs() -- maybe we should change it to return the same type for signed. alternately, add iabs() which does something like clamp(abs(x), 0, max-for-type) for int...

@steven-johnson
Copy link
Contributor

Wait, that can't be the example, because if we know nothing about expr in that case the access is unbounded anyway. I guess there are cases where we know an upper bound on expr but not a lower bound?

in this case the full expr is Expr mirror_y = clamp(min(abs(y), 2 * max_y + 1 - y), 0, max_y);

@abadams
Copy link
Member Author

abadams commented Nov 29, 2023

If this really becomes the show-stopper, we could declare that abs of the most negative 32/64-bit integer is UB (which it already is in main if you cast the result back to int due to the signed integer overflow), and then handle int casts of abs specially in bounds inference.

@steven-johnson
Copy link
Contributor

I don't know if it's a showstopper, but in the first failure I looked at, literally every failure warning was about abs()

@steven-johnson
Copy link
Contributor

I tried writing a shim like so:

Halide::Expr iabs(Halide::Expr i) {
  if (i.type().is_int()) {
    return cast<int32_t>(min(abs(i), cast<uint32_t>(i.type().max())));
  } else {
    return abs(i);
  }
}

but it doesn't help, e.g.

Formerly bounded cast is no longer bounded:
 Cast: int32(min((uint32)abs((blur_x.s0.x.guarded/2) + -1), (uint32)2147483647))
 Bounds of arg: (uint32)0 min(max(uint32(((4 - min(input.extent.0, 1))/2)), uint32(((min(max(input.extent.0, 1), min(max(input.extent.0, 1) - min(input.extent.0, 1), 7) + min(max(input.extent.0, 1), min(input.extent.0, 1) + (((max(input.extent.0, 1) - min(input.extent.0, 1))/8)*8))) + -3)/2))), (uint32)2147483647)
 Bounds of output: (void *)neg_inf (void *)pos_inf

not sure if this is just a false positive in the sniffing code or not? but I still fail with a buffer being accessed in an unbounded way.

@steven-johnson
Copy link
Contributor

ContainsDodgyIntCast may not be smart enough, it doesn't check to see if the value in question has been constrained to a legal range. Maybe something like changing can_represent() -> can_prove(op->value >= op->type.min() && op->value <= op->type.max())?

steven-johnson added a commit that referenced this pull request Dec 1, 2023
* Handle many more intrinsics in Bounds.cpp

This addresses many (but not all) of the `signed integer overflow` issues we're seeing in Google due to #7814 -- a lot of the issues seems to be in code that uses intrinsics that had no handling in value bounds checking, so the bounds were naively large and overflowed.

- Most of the intrinsics from FindIntrinsics.h weren't handled; now they all are (most by lowering to other IR, though the halving_add variants were modeled directly because the bitwise ops don't mesh well)
- strict_float() is just a pass-through
- round() is a best guess (basically, if bounds exist, expand by one as a worst-case)

There are definitely others we should handle here... trunc/floor/ceil probably?

* Fix round() and strict_float() handling

* Update Bounds.cpp

* Fixes?

* trigger buildbots

* Revert saturating_cast handling

* Update Bounds.cpp

---------

Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com>
ardier pushed a commit to ardier/Halide-mutation that referenced this pull request Mar 3, 2024
* Handle many more intrinsics in Bounds.cpp

This addresses many (but not all) of the `signed integer overflow` issues we're seeing in Google due to halide#7814 -- a lot of the issues seems to be in code that uses intrinsics that had no handling in value bounds checking, so the bounds were naively large and overflowed.

- Most of the intrinsics from FindIntrinsics.h weren't handled; now they all are (most by lowering to other IR, though the halving_add variants were modeled directly because the bitwise ops don't mesh well)
- strict_float() is just a pass-through
- round() is a best guess (basically, if bounds exist, expand by one as a worst-case)

There are definitely others we should handle here... trunc/floor/ceil probably?

* Fix round() and strict_float() handling

* Update Bounds.cpp

* Fixes?

* trigger buildbots

* Revert saturating_cast handling

* Update Bounds.cpp

---------

Co-authored-by: Andrew Adams <andrew.b.adams@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_notes For changes that may warrant a note in README for official releases.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fuzzer failure in fuzz_bounds Bounds inference handles casts from uint32 to int32 incorrectly
2 participants