-
-
Notifications
You must be signed in to change notification settings - Fork 776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NEP50 #8323
base: main
Are you sure you want to change the base?
NEP50 #8323
Conversation
cc: @seberg for vis |
5cea763
to
b5dced7
Compare
OK, this starts making sense. Locally I'm down to a handful of failures in jit compilation and fusion, and a first sanity review would be great. Also might be a good time to think how to switch it on the CI. Also locally I'm having issues with float16 failures and several OOM errors, with or without weak promotion. Not sure what is the reason. A breakdown of test results in
|
Thank you for tackling this, @ev-br! As for OOMs, I'd suggest excluding slow-labeled tests ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for starting this @ev-br! Sorry about the slower followup on our discussions.
This is nicer and really quite minimal, and I think we can make it even more minimal!
As said below, I think passing the weak
tuple is probably the pragmatic solution right now and may well be the best solution in the long run also.
A few notes (some of which are inline):
- It might be nice to have at least a single per-ufunc/kernel "promotion" step. I.e. a custom
guess_loop
. That would solve many issues, because you could callnp.ufunc.resolve_dtypes()
. I will note that this function is currently slow and also is only available since 1.24 or so. - I have suggested some simplifications, which I think should make things even shorter (I may be missing reasons for why not though).
- In NumPy I had to add some "very special" paths unfortunately. I think it makes sense to skip that for this PR. However, it would probably make sense to at least for
uint_array > -1
add a special path to make it work.
(i.e. if it is tricky to do for the comparison ufuncs, make it work for the comparisons. For NumPy downstream testing code ran into it often enough and it is somewhat surprising to fail...)
@@ -114,7 +114,7 @@ class TestRandintDtype(unittest.TestCase): | |||
def test_dtype(self, dtype): | |||
size = (1000,) | |||
low = numpy.iinfo(dtype).min | |||
high = numpy.iinfo(dtype).max + 1 | |||
high = numpy.iinfo(dtype).max |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, this change the test, Maybe need at least a new xfail here since that should work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Have to xfail for numpy 1.26 + remove short ufunc loops here as well:
7217c8d
for a in in_args]) | ||
in_types = tuple([a.dtype.type for a in in_args]) | ||
|
||
op = cache.get((in_types, weaks), ()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I suspect one could use in_types[weak_pos] = int
or so also here, but I think using the weak
tuple works well for starters.
(I might want to push for a cleanup after/with my other PR, but this is more urgent.)
cdef _Ops ops_ | ||
if dtype is None: | ||
use_raw_value = _check_should_use_min_scalar(in_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for me to decide. If we wish to support both 1.x and 2.x (for the most part), it could make sense to keep the _check_should_use_min_scalars
here.
As a very subtle point, NumPy still (usually) uses the same check as in if all(weak): weak = [False] * len(weak)
.
(Which is arguably annoying, and only has an effect for weird integer inputs, though, so I think it is OK to ignore it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for me to decide either. This PR so far labored under assumption that cupy 14 only supports numpy 2.0, per #8306 (comment).
BTW, this implies that cupyx.scipy requires scipy 1.13
cupy/_core/_kernel.pyx
Outdated
|
||
if not ok: | ||
raise OverflowError(f"Python integer {arg} out of " | ||
f"bounds for {out_type}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a lot of code that really ends up doing the same as: out_type(int(arg))
or out_type(arg.item())
?
I suppose one could check safe-cast first, although I am not sure it would be faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean, mind expanding? This code is meant to check iinfo.min <= arg <= iinfo.max
where arg can be signed or unsigned. Cython is trying to be helpful, so need explicit casts to avoid overflows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is, write this whole thing as:
for in_type, arg, weak for zip(in_types, in_args, weaks):
if not weak is not int: # assume we change what is stored in `weaks`
continue
in_type(int(in_arg)) # check if Python integer fits
(Or something similar of course, not sure that the zip is great if we micro-optimize.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that I look again, I think it also needs to loop in_types
and not rely on out_type
being identical.
cupy/_core/_kernel.pyx
Outdated
if not can_cast(it[0], ot) and not can_cast(it[1], ot): | ||
break | ||
elif not can_cast(it, ot): | ||
elif not is_weak and not can_cast(it, ot): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it is a bit slow, but it should be cached, so I would suggest replacing the full above logic logic with:
try:
safe = np.result_type(weak(0), ot) == ot
except TypeError:
safe = False
Might not be the prettiest ever, but I think it is preferable to custom implementation of the category logic and more flexible for future additions.
I am suggesting another trick: Let's make weaks = [False, int, float, complex]
i.e. store either False
or the type, so that we can use the type information directly from there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, in_types
entries are numpy scalars types already, after the call to
Line 180 in 622cdf4
cdef inline _preprocess_arg(int dev_id, arg, bint use_c_scalar): |
Line 1313 in 622cdf4
in_args, weaks = _preprocess_args(dev_id, in_args, False) |
Which is the main reason to have an additional weaks
tuple to be able to distinguish int8(1) + 1
from int8(1) + int8(1)
.
So your suggestion is to keep python types to be python types all the way instead of converting them to numpy types early on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be nice, but probably comes with it's downsides. I was suggestion to make the weaks
tuple contain one of False/int/float/complex
rather than just True/False
. That gives you the information which original type you were dealing with without using issubdtype
or so.
cupy/_core/_routines_math.pyx
Outdated
('bb->d', 'BB->d', 'hh->d', 'HH->d', 'ii->d', 'II->d', 'll->d', 'LL->d', | ||
'qq->d', 'QQ->d', 'ee->e', 'ff->f', 'dd->d', 'FF->F', 'DD->D'), | ||
'out0 = (out0_type)in0 / (out0_type)in1', | ||
('qq->d', 'QQ->d', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this has a slight miss for mixed signs, but that seems OK.
In principle it would be nice to solve this with a custom promotion step for the ufunc. This could be very nice, because for (almost!) all NumPy ufunc, we should be able to at least try and use:
dtypes = ufunc.resolve_dtypes()
f08268b
to
60a80c3
Compare
Progress: |
Status:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my perspective, I think the logic here should be big enough step forward that it would be great to push it forward/merge soon.
The main issue (as @ev-br said) seems now to ensure that tests pass for developers and CI. What we need for that may be:
- Force testing with NumPy 1.26 and
NPY_PROMOTION_STATE="weak"
. We can do that inconftest.py
. - Fixing CI! Unfortunately, I am not sure what is all needed for that. I would think that could be split out in a separate PR.
Otherwise, I think it should be at most be small fixups (a few new xfails, and maybe making sure all xfails/odd or temporary paths are easy to grep for).
There are some more complicated issues, however, I think it is probably better to defer them:
- Many current xfails actually work in NumPy 2 (should we use a skipif?)
- NumPy 2 has some special logic to e.g. make
uint_arr > -1
work.
(I am not sure how vital this is, but to fix it we will have to add a distinct layer of logic and that seems better in a new PR with an issue.)
The goal here is currently to create a CuPy v14 that matches NumPy 2, but fails to match NumPy 1.x w.r.t. to promotion.
That means users will have to stay with CuPy v13 if they have NumPy 1.x (and need it to match).
I think this aligns with the current plans of the CuPy team, but I am not sure.
I will note that I am sure we could implement both versions side-by-side without too much hassle and I would be happy to do this if it seems helpful/desirable (even if it is just helpful with getting CI going).
With a slew of fixups/skips/xfails in the cupyx namespace, this is ready from my side. |
This (tries to) build artifacts based on Evgeni's PR to CuPy which should have most fixes needed to run with NumPy 2 and thus allow testing downstream projects with NumPy 2 if they also need CuPy. This includes most/all necessary Python fixes as well as fixes to make promotion align with NumPy (NEP 50). See also: * cupy/cupy#8323 * cupy/cupy#8314 * cupy/cupy#8306
These tests fail with "arrays of different dtypes are returned" on NumPy 1.26.4 with weak promotion (export NPY_PROMOTION_STATE=weak), but pass under NumPy '2.1.0.dev0+git20240419.da95f8e'. Therefore, this commit can be reverted when NumPy 2.0 becomes the minimum supported version.
The NumPy < 2 idiom, int( np.array(seed).astype(np.uint64, casting='safe') ) does not work in NumPy 2.0
Test that either both NumPy and CuPy raise OverflowErrors, or both do not.
Otherwise, `arange(3, dtype=cupy.uint8).round(-2)` raises: `-2` cannot be cast to the array unsigned dtype.
SciPy does not emit a warning anymore? Retry with NumPy 2.0 proper.
cannot have short int/uint loops: >>> arange(3, dtype'int8') < 200 should convert both operands to int64/uint64
Reuse the type promotion logic from ufuncs.
Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net>
Split off #8314 to keep the diff sizes under control.
Adapt the codebase to NEP 50 (https://numpy.org/neps/nep-0050-scalar-promotion.html). Unlike gh-8314, this PR can be run under NumPy 1.x using
The main change is that under NEP 50, we have to keep track of "weak" scalars:
Implementation-wise, CuPy seems to convert python scalars to numpy scalars rather early in the game; thus we keep track of which arguments of ufuncs are python scalars, and take this into account when selecting ufunc loops.
So far am only doing it in ufuncs, will need to see if ElementwiseKernels / fusion loops / reductions are affected.
The goal is to make tests pass locally first with numpy 1.26.x & weak promotion rules, so that one can think of switching this on the CI. So far it's not even close: need to weed away lots of assumptions about the promotion rules in code and tests.