Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] FEA Gower distance #16834
base: main
Are you sure you want to change the base?
[MRG] FEA Gower distance #16834
Changes from 225 commits
e5fdbbb
dcf96f4
41a2748
d3221a7
da71fba
a63c43f
e50d9d9
47b20a9
12b773b
a32f8e7
3480bf2
7be14ba
3d1f2bc
181a750
b8da4c9
e31f72b
5096b76
1230b6f
16b756f
db6303b
7cb7ce9
9679345
066d9fa
348bf40
1b6f8b6
89b8884
ab8a61d
ef90d8e
dd1fdcd
71ce0c5
705fec9
9e5a2ac
1ed4550
a3a3135
ecb50be
57693b1
9fd98c7
ed2ce90
6708e0d
2ca1fa4
fb21d0f
8cc70af
a7654e4
52bb273
fcc9519
1df1fea
0339b55
19f1c57
dcf3a37
2b1a697
2cb2802
84dfcf1
d798f06
4889385
1cd6979
206cd26
43b77ef
6d847d4
9379e2c
4bf77e7
992b5cb
dbc6f55
da825de
3090915
4d10175
460484f
8b7f236
c699f8d
ddf9022
b3ad764
b99aacc
8f1bcd3
cda1b54
6b10f24
5209834
cc0184f
ceb4b44
172d21f
f3b1544
8ed3ecb
0c4d489
6c1054e
ca12d35
9c09d9b
ed74af0
f40273f
7dd2a9b
c5a4472
3a9f576
d3b10fe
fcb4763
6f2d98d
5c6c30d
745de05
6fa6e88
077b3cb
ded653a
eb1ee32
782eb3d
4c03f5c
3ca56d5
29d82d5
16d339d
6ea57ac
16b9377
1474df8
49a5ac2
c92d47d
67491ce
e123d36
a993bbe
5e4cf76
66650fa
23966ff
ae7f556
d257bba
52ad60f
336c183
c4959fa
faa404f
4a2d89e
5b84803
cc58403
f69fd04
098bef9
bab9ca0
bba8828
26779a0
87e2f63
ff6366b
0931f81
fa44c39
4a46ae1
38d99d5
a93efa5
512428d
6a403d5
29cd45e
a811d57
4d6d584
dbd4af5
091a7fa
117cca0
34e78ae
0a802c3
6df57c2
e610965
1cddfdf
850caa6
545e496
6b438cf
b9d2188
df73f9e
bc08577
a73852e
127bc7b
8cd9ca3
e8c6624
a339e48
27b7fd9
8e37937
d11d8e7
c375fee
462c6f3
8f4e9de
58770f0
b3270a8
188e0ca
eb1ab6b
eab56c4
5f3421e
9317415
d16f833
b23fc65
da6b46d
d84be70
e67579d
e5167e0
7de895b
c88cf0f
08e692a
19e4f0b
7d480b2
14d0d8b
3b3bb54
82707d2
e187e01
7370840
a86ba38
c0f3ee2
d1a116f
8ddfb1b
b37f750
88f835d
77d925f
72bc1dc
984a6a0
cf861bd
1510744
988028a
8454f97
f1d840d
a8f2a65
37359f0
63c179e
8786f5d
c1f3599
fa281c3
c8e840d
4859b81
6ace7da
3232262
89fae67
2ff6e0a
ab8a1f8
d8c2fb5
3d6cd99
08177af
58f4b81
0312f1e
80e71a0
b0e3b11
c0502b9
adb7854
7287b1a
f31ca7b
330f57a
aa8758d
0e8feb8
9635f76
3708a67
9262412
d8b445e
3d433fc
7b6278c
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here I'm basically converting the data to a numpy array only if it's not a pandas dataframe or an array already. Feels like it should be a
check_array(X, ..., accept_pandas=True)
call.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why avoid calling
check_array
if X is already an array?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, but still need to avoid calling it if it's a pandas DF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we care that this is (I think) always a copying operation even if col_idx is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't that be an issue with
_safe_indexing
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP257: this should be a one-line summary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does "Equivalent to" apply to the case where
min_values=None
? Use the words "if None"There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it a function of itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is using this row-wise function call worthwhile relative to something more vectorised?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's kinda easy to do more vectorized when X and Y are of the same size, otherwise I'm not sure if it's worth the complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this implementation is significantly faster than the cosine distances for instance. So I don't think we should worry about the speed too much?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or
_non_nans(x, y) - np.sum(x == y)
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't seem to help the time at least when I test it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether it's worth running
locals().update(_precompute_metric_params(X, Y, 'gower', **locals())
to reduce code duplication... and move to a more elegant solution eventually?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the
else
case should never be executed.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear why this code is commented out.