New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
numpy.isclose vs math.isclose #10161
Comments
Too late to change anything imho.
|
This is one of the most widely used function in all of numpy (through |
I confess that the current implementation looks like a bug. Note that using |
At the very least, this needs a warning in the docstring that it mismatches the builtin |
How about adding this?
Incidentally, that may be an idea for the random number versioning... |
For clarification, the import would not actually import isclose but would enable the new semantics. It would be scooped to a module. The idea is that we can allow people to use the correct function and at the same time, we don't need to break any existing code. |
I don't think making this not import is possible. Just pick a longer name, like Note that |
Actually, the |
The iterator exceptions were not a syntax change. We just need a small C module to inspect the namespace of the calling module. We can even cache the modules so that only import time is slowed, and only trivially. |
You're right - and This is actually a nice way to avoid the problems in #9444, as an alternative to use |
Let's CC @ChrisBarker-NOAA as the creator of This is a pretty minor issue IMO. Generally |
I asked on StackOverflow about using TLDR: it's possible, but not easy or clean. |
The stack introspection is not just for this, but is proposed as a general policy for changing the return values of functions. The question is: in principle, should this be changed? I do think that if the answer is yes, the deprecation period needs to be at least a few years given the widespread use. Python never included a past module but that's one option to try to improve API stability, if there is demand. |
The other method would be to just add this as an option |
I agree that this is a minor problem when you dont put a meaning to the However, sometimes you would like to say that the error is within, for example, 1% ( Meaning that some values can pass with ~2% error: atol = 1e-8 #default
rtol = 0.01 # 1%
b = 1e-6
a = 1.0199e-6 # ~ 2% larger comapared to b
abs(a - b) <= (atol + rtol * abs(b))
True Implementing the idea of @charris, would make this particular case very slightly worse, as it even more relaxes the comparison, but still worth it as it gets rid of the symmetry problem and indeed is backward compatible. IMO it would be better if Numpy used the Math function, but i understand that the change may be too disruptive and possibly not important for most users. Making the option to change between the |
@njsmith: thanks for bringing me in. A bit of history: when I proposed isclose for the stdlib, we certainly looked at numpy as prior art. If it was just me, I may have used a compatible approach, for the sake of, well compatibility :-). But the rest of the community thought is was more important to do what was "right" for Python, so a long discussion ensued... I tried to capture most of the point in the PEP, if you want to go look. Here is the reference to numpy: https://www.python.org/dev/peps/pep-0485/#numpy-isclose You can see that the same points were made as in this discussion. In the end, three key points came out:
In the end, I think we came up with the "best" solution for math.isclose. But is it "better" enough to break backward compatibility? I don't think so. Unfortunately, much of numpy (and python) had many features added because they were useful, but without a lot of the current discussion these things get, so we have a lot of sub-optimal designs. This is expected (and necessary) for a young library, and we just have to live with it now. @njsmith is right -- I think very few uses of np.isclose() have the tolerances set by rigorous FP error analysis, but rather, trial an error, with np.isclose() itself. However, I think the bigger problem is the default
ouch! and yes, it's the
So it probably is a Good Idea to have a path forward. I like the keyword argument idea -- seems a lot more straightforward than trying to mess with future and the like. And we could then decide if we wanted to start issuing deprecation warnings and ultimately change the default many versions downstream... |
I'd like to propose @bashtage's suggestion: """ I think it's a better option than a new function name, and could lead the way to future deprecation and change of the default (or not). I think in addition to the (strong) symmetric test, Given that, maybe we need a better name for the parameter than |
Oh -- and it might be good to look to make sure that |
The easiest way to do that is probably to put deprecation warning in and
then remove it before merging. We could just go ahead and put the
deprecation warning in that says to change existing code to
symmetric=False; there's no reason to say that the default needs to be
changed anytime soon even if the warning is there.
|
Do we need to add it to |
@xoviat @eric-wieser there cannot be any backwards incompatible changes to |
I'm afraid that this still sounds like a lot of pain for infinitesimal
gain, to me.
Having atol be nonzero by default is not an accident. It's needed to get
sensible default behavior for any test where the expected result includes
exact zeros.
|
No changes to any assert_allclose are required, as it would be updated to
internally to use the old behavior.
|
IMO silent changes are nasty! The deprecation warning is only so that
people eventually don't need to type the flag in an interactive prompt to
get the new behavior, nothing more
|
Agreed. However, there should be zero changes. Deprecation warnings in widely used functions also force users (at least the ones that actually do maintenance) to make changes. |
What about fixing the non-symmetry issue like @charris suggested? This would also free up some documentation space, that could be filled with a warning about the bad things that may happen when |
We can't fix that issue without letting people know about the changed
behavior. The only way that I know how to do that is with a warning.
What would really be nice is an automated refactoring tool to inject flags
into old code. Then 'maintenance' would just be running a script on your
code; 5 minutes of work.
|
Sorry, this can be fixed, just with a new flag.
|
That suggestion (#10161 (comment)) seems feasible.
We're not fixing it. This is the thing about cost/benefit of any change that is just being discussed in the semantic versioning issue. This is definitely a case where we will not make any change or issue any warning from a widely used testing function. Note that the current behavior is not a bug (although arguably a suboptimal choice). |
To be clear, I was proposing a change to isclose but not assert_allclose.
From looking at sources, that change will be a third as disruptive.
However, the benefit is still probably too small. I don't think anyone
objects to adding a flag though, correct?
|
I think this is exactly why we have a separate namespace for the NEP 47 compatibility layer. Once we offer the alternatives there, we can slowly try to move those behaviors into the main NumPy namespace after an appropriate deprecation cycle. |
Well, we have Ralf's NEP 23: https://numpy.org/neps/nep-0023-backwards-compatibility.html I think the case of If we were going to do this, the way to do it would be with new function name or namespace. This lets users migrate on their own time without needing to change code twice. |
It seems there are really two issues in the revival of this thread: what to pick for the API standard, and whether to adjust numpy's implementation. To me, the former seems fairly easy: pick the better stdlib implementation (to me, actually better mostly because of a default Changing numpy's implementation is much trickier - I might wish to ignore the breakage myself, but really that's unfair to many users and I think NEP 23 is pretty clear that the process should be long and torturous. But the route of a different namespace for the array API seems a very good compromise - indeed, great to have a way to slowly move to a smaller, better defined API! |
On Wed, Jun 2, 2021 at 10:45 AM Marten van Kerkwijk < ***@***.***> wrote:
It seems there are really two issues in the revival of this thread: what
to pick for the API standard, and whether to adjust numpy's implementation.
To me, the former seems fairly easy: pick the better stdlib implementation
(to me, actually better mostly because of a default atol=0).
Changing numpy's implementation is much trickier - I might wish to ignore
the breakage myself, but really that's unfair to many users and I think NEP
23 is pretty clear that the process should be long and torturous.
yup - it should be.
But I think this is a real wart in numpy right now.
The asymmetric test is not that big a deal, but these are big deals, that
probably are causing hidden broken tests in folks' code right now:
* numpy.isclose (and allclose) and numpy.testing.assert_allclose have
different defaults -- that's a serious source of confusion, and maybe
errors.
* The nonzero default for atol in isclose can lead to serious errors in
tests.
* Combining the rtol and atol can be a real problem with values of very
different magnitude -- again, actual bugs can result.
So it would be really nice to improve this -- following NEP 23 of course.
But the route of a different namespace for the array API seems a very good
compromise - indeed, great to have a way to slowly move to a smaller,
better defined API!
I haven't been following that discussion, but yes, if we are moving to a
new namespace with a new API, that would be a fine way to make this
transition.
…-CHB
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#10161 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAG7YYAK2KBIPAR7JZSC333TQZU4TANCNFSM4EGYT6HQ>
.
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
***@***.***
|
My two cents. One should also not forget that Edge cases in which By the way the current docs are very well written, so I would not assume that a symmetric |
On Mon, Jun 7, 2021 at 1:37 AM Stefano Miccoli ***@***.***> wrote:
Thinking about it more, the argument for the numpy version is that it
checks closeness to a wanted value, whereas the python version has no
preference between the two values. One can argue for both, isclose vs
areclose :) The first might be better for testing, the second for arbitrary
pairs of numbers.
One should also not forget that math.isclose compares two scalars, so it
is natural to expect that math.isclose(a,b) == math.isclose(b,a). But
np.isclose(a,b) compares elementwise two arrays, so it is preferable for
me that the elementwise tolerances are computed using a single array,
namely b. On the contrary it would be quite disorienting that a portion
of the elementwise tolerances would be computed using a and the remaining
using b for the sake of symmetry only.
I don't think of it that way - you are not using different values for the
sake of symmetry -- you are comparing two arrays to see if their values are
close to each-other, and thus might also expect symmetry:
np.alltrue(isclose(arr1, arr2) == isclose(arr2, arr1))
I'm sure there would be surprises if that didn't hold.
And with asymmetric tests, it's not obvious to me that I want to use "b",
rather than "a" anyway -- without carefully reading the docs.
At least numpy.testting.assert_allclose() used "expected" and "actual" --
which make it clear which is which.
Anyway, a few points:
1) I don't think the vectorization makes a difference here --- either you
are thinking "are these two numbers close to each other?" (that is,
kind-of-equal) or "is this number close to this other number?" -- they are
two very slightly different questions, and I think isclose is used more for
the former than the later.
2) In practice it makes very little difference, because:
- if the tolerance is small enough, you literally get the same answer.
- many (most?) people arrive at the tolerances through experimentation
anyway.
Edge cases in which np.isclose(a,b) !=np.isclose(b,a) are not a big
issue; on the contrary, if b is a reference value, I would like to avoid
that the elementwise tolerances could be relaxed basing on the values of a
.
Again, only for large tolerances, but still it depends on how you think of
it. I think that if you are using a large tolerance to check if a value is
within, say X% of a known value, you probably shouldn't use `isclose()` --
though I agree that an asymmetric test applied appropriately would indeed
work for that.
By the way the current docs are very well written, ...
Actually, I would disagree with that ;-). In fact, the summary says:
"""
Returns a boolean array of where `a` and `b` are equal within the
given tolerance. If both `a` and `b` are scalars, returns a single
boolean value
"""
The use of the phrase "equal within the given tolerance" sure implies that
my use above of "kind of equal" was the intent and I'd think most folks
would expect a symmetric result.
If you read down you get to:
"""
Unlike the built-in `math.isclose`, the above equation is not symmetric
in `a` and `b` -- it assumes `b` is the reference value -- so that
`isclose(a, b)` might be different from `isclose(b, a)`.
"""
which makes it very clear -- but how many people read the entire docstring?
And, of course, that extra bit was added well after the initial
`numpy.isclose()` docs -- after PEP485 was implemented in Python 3.5
In fact, if we don't make any changes, it might be good to update the
summary in the docstring to something like:
"""
Returns a boolean array where the result is True for the values of a that
are within a relative tolerance of those in b.
"""
|
`Numpy` uses a small absolute tolerance as the default for checking closeness (probably to make checks against zero possible). See: https://numpy.org/doc/stable/reference/generated/numpy.isclose.html?highlight=isclose#numpy.isclose However, this can be problematic when values being compared are very small (as big relative differences will still fall under the tolerance). The `isclose` in `math` actually uses the safer zero absolute tolerance as default: https://docs.python.org/3/library/math.html#math.isclose Interesting discussion here: numpy/numpy#10161
`Numpy` uses a small absolute tolerance as the default for checking closeness (probably to make checks against zero possible). See: https://numpy.org/doc/stable/reference/generated/numpy.isclose.html?highlight=isclose#numpy.isclose However, this can be problematic when values being compared are very small (as big relative differences will still fall under the tolerance). The `isclose` in `math` actually uses the safer zero absolute tolerance as default: https://docs.python.org/3/library/math.html#math.isclose Interesting discussion here: numpy/numpy#10161
Very informative thread. Was consensus ever reached on handling |
The proposal from @pmeier that was the last comment on this topic seems very reasonable. This point is a bit moot unless we're actually going to be able to make a chance via introducing a new function though, which is where this topic is stuck on right now. The benefits don't seem large enough that that is a compelling prospect - or at least no one is putting in the large amount of effort required for that. |
I think the following point by @ChrisBarker-NOAA isn't given remotely enough attention - that all input values are on order of unity: np.allclose(1e-9, 2e-9) == True I do extensive numeric testing and discrete optimizations in signal processing applications, and I trusted NumPy to get this right. Filters typically rapidly decay in time and frequency domains, and that's with them being unit-normed (L1/L2, not value-by-value) - the difference between At bare minimum, this should be mentioned in However, I see numpy.isclose does have a warning concerning the unity order. Besides that, it's just a slightly different docstring, yet the underlying formulae and default tolerances are exactly the same? |
No one disagrees that this is bad. It's not about attention here, it's about doing the work needed to introduce a suitable replacement (and dealing with the high level of bike shedding that a new name is inevitably is going to attract - this is a lot more effort than you'd expect, and not much fun).
Definitely. Would you be willing to open a PR to improve this? |
Yes. Can you clarify |
Yep, that looks like all that's needed there. The implementation of |
What's the issue with I'm currently looking into |
See numpy#10161 (latest comments, ~8/13/2023) 1: Add `isclose`'s warning to `allclose` 2: Revise `isclose`'s "Notes" on `atol` to emphasize its poorness and better explain its purpose 3: Add revised `isclose` note to `allclose`
I'm actually not at all happy with NumPy's default The current heuristic I'm thinking of is It's too late to change
be changed to
but I'll do that in a separate PR.
|
Nope, |
IIUC, you want to have a
My suggestion would be something along the lines of def isclose(a, b, rtol=None, atol=None, ..., dtype_specific_tolerances=False):
if dtype_specific_tolerances and rtol is None or atol is None:
dtype = np.promote_types(a.dtype, b.dtype)
rtol, atol = get_auto_tolerances(dtype)
else:
if rtol is None:
rtol = 1e-5
elif atol is None:
atol = 1e-8
... with |
I don't follow. I use NumPy a lot and I've never seen it mean anything else. What do you have in mind?
Right, that's what I had in mind once I'd actually PR. However, I disagree with |
Usually
I'm sorry, but I don't follow. Could you compile an example where you think this would be a problem? The idea here is to go with the more stricter tolerances by default in case the dtypes mismatch to avoid silent errors. Meaning, if you put in float32 and float64, by default it will be compared as if we had two float64. If the user didn't think about it, this operation might fail, but then they have instant feedback to change it. Either convert the float64 to float32 as well or set the tolerances manually. The other way around, we could get silent passes, which can be much worse in a testing setting. |
Ok, agreed on both, thanks for explaining. |
numpy.isclose (https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.isclose.html):
math.isclose (https://docs.python.org/3/library/math.html#math.isclose):
Note that Numpy's equation is not symmetric and correlates the
atol
andrtol
parameters, both are bad things (IMO).Here is a situation where Numpy "incorrectly" flags two numbers as equal:
Here is another one, this case symmetry problem:
Python math version looks to be bulletproof, should Numpy start using this? Are there any benefits of using the Numpy version?
The text was updated successfully, but these errors were encountered: