Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX euclidean_distances float32 numerical instabilities #13554
FIX euclidean_distances float32 numerical instabilities #13554
Changes from 1 commit
d38d8a0
01767d6
83cada9
ecb3d2c
6b140db
558b988
de9d217
395be7a
9c79570
4599379
7425821
a2504af
fee1258
fe74973
7f5f257
46fb590
d8a2341
e0a0dc5
4b6e707
dd634db
0e916a5
78cf2c6
1a96e1e
46c4560
dbbd855
77b456f
6dace0c
37ef0c3
1d03e13
1d5dca9
bf1af7d
b4fb670
2de3969
800ecae
5eff972
5ebcef0
532f110
37ad253
984719c
d0b7441
14a3f16
1fd5b71
7558d50
b4d0527
adceb7d
127cc41
37b099b
bdea46f
d91682a
d7815e3
8a966f4
96ff3b0
c21be57
c3e72f2
78cb1b3
0e4f561
383b132
3801caf
5dc1c46
b928396
b957edd
c0fb225
95d39ca
f11f647
306d202
78c9cb5
66fa659
f229708
6b29d38
876908e
04fcbce
cc2c186
02e864e
740c0fb
247f0e7
62b5a85
3b54222
63cd7d4
e8be8aa
78acc98
2a2caff
5a1f05a
e5a8e0b
0ca7c96
7005ab4
8e3cc5e
18dc3a9
32851c1
f818d86
32ea647
9cf6606
8d4bffc
5135252
e15e391
9a5b3ac
77ac3df
184cd2c
fbcfc52
e5e2848
32c9f99
0f346f5
fce0713
117529e
9c205c1
985037a
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes the assumption that
sizeof(dtype) == 8
without making it appear in the computation, I think?Maybe we could add
X.dtype.itemsize / 8
even if that is equal to 1, to make it easier to follow what is going on.Also not that it matters too much, but
X.nbytes == x_density * n_samples_X * 4 / 8
?X.data, X.indptr
in 32 bit, but again it's far from obvious for a casual reader.Actually, maybe,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is only used for float32 X and Y. I've explained it in its docstring.
When you change the type of the csr matrix, only a copy of
data
is made.indices
andindptr
are still int, so I think this formula is correctThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure?
That's not critical, my point here is that it might be beter (not necessarily in this PR) to use
ndarray.nbytes
than trying to re-compute that from scratch. If we get this calculation wrong we won't likely know, since no test will fail and we will only reason in wrong sizes of memory cache.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add
copy=False
, in the case whend = distances[y_slice, x_slice].T
it is alreadyfloat32
, and astype make a copy by default.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function only computes euclidean distances on float64 and downcast it to float32 at the end so actually there will always be a copy. Actually I find it more clear to not add the
copy=False
parameter to emphasize that