ENH Array API support for LabelEncoder #27381

OmarManzoor · 2023-09-15T08:02:04Z

Reference Issues/PRs

Towards #26024
Related to #27369

What does this implement/fix? Explain your changes.

Adds Array API support for LabelEncoder including all the inner functions that it uses.

Any other comments?

CC: @betatim @ogrisel

github-actions · 2023-09-15T08:04:07Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 7500c2f. Link to the linter CI: here}

ogrisel

Hi, I don't have time to finalize my review now but here is a first few feedback:

sklearn/utils/_array_api.py

sklearn/utils/tests/test_array_api.py

ogrisel

Some more feedback.

sklearn/utils/_array_api.py

sklearn/utils/_encode.py

sklearn/utils/_array_api.py

sklearn/utils/tests/test_array_api.py

sklearn/utils/estimator_checks.py

ogrisel · 2024-03-28T13:46:26Z

@OmarManzoor sorry I had not seen you had taken my first pass of review into account.

This PR now needs a bunch of conflict resolution because of concurrent changes merged in main. Are you interested in following-up or are you busy on other things and would rather someone else to takeover?

OmarManzoor · 2024-03-28T16:28:58Z

@OmarManzoor sorry I had not seen you had taken my first pass of review into account.

This PR now needs a bunch of conflict resolution because of concurrent changes merged in main. Are you interested in following-up or are you busy on other things and would rather someone else to takeover?

I would not mind working on it but I might need some time since I would have to revisit what I had done here and what further needs to be done after conflict resolution. 😊

ogrisel

Some more feedback below:

sklearn/utils/_encode.py

sklearn/utils/estimator_checks.py

sklearn/utils/tests/test_array_api.py

sklearn/utils/_array_api.py

sklearn/utils/estimator_checks.py

sklearn/utils/tests/test_array_api.py

sklearn/utils/_array_api.py

OmarManzoor · 2024-04-18T11:51:27Z

@betatim I think the tests are failing because of array-api-strict. Should we handle array-api-strict too?

OmarManzoor · 2024-05-03T12:36:39Z

@ogrisel It seems that the tests are failing because of array-api-strict. It doesn't have anything called argsort which we are using in the isin function. From your comment above it seems we don't need to include array-api-strict in this but then how do we ensure that the tests don't fail?

ogrisel

I pushed a few commits to fix some of the failures (including failures related to devices that are only possible to trigger with torch and non-CPU devices). But we still need to work with libraries such as array-api-strict that do not yet implement xp.searchsorted, see below:

sklearn/utils/_encode.py

ogrisel · 2024-05-03T16:27:04Z

Should we handle array-api-strict too?

Yes we should. It's the easiest and fastest way to detect non-compliance problems in our code.

ogrisel

There is also another problem with CuPy that I cannot debug because I temporarily lost my access to my CUDA host, but the short error message was:

FAILED sklearn/preprocessing/tests/test_label.py::test_label_encoder_array_api_compliance[y0-cupy-None-None] - TypeError: unhashable type: 'ndarray'

unfortunately, I no longer have access to the full traceback. I will need to wait for the machine to be free again to re-run that tests.

In the mean time:

doc/whats_new/v1.4.rst

sklearn/utils/_array_api.py

sklearn/utils/_encode.py

ogrisel

I tested this PR on all supported namespaces and device combinations and all tests pass.

ogrisel · 2024-05-07T09:51:27Z

@betatim I think this is ready for another round of review on your end.

sklearn/preprocessing/_label.py

OmarManzoor · 2024-05-09T13:19:26Z

Hi @betatim does this looks okay now?

betatim

Looking good!

Another round of comments. Sorry for not including them last time already

betatim · 2024-05-16T08:45:28Z

sklearn/preprocessing/tests/test_label.py

+        xp_label_fit = xp_label.fit(xp_y)
+        xp_transformed = xp_label_fit.transform(xp_y)
+        xp_inv_transformed = xp_label_fit.inverse_transform(xp_transformed)


I couldn't work out why the xp_label_fit is needed, so suggesting to remove it (same for np_label_fit later on)

Suggested change

xp_label_fit = xp_label.fit(xp_y)

xp_transformed = xp_label_fit.transform(xp_y)

xp_inv_transformed = xp_label_fit.inverse_transform(xp_transformed)

xp_label = xp_label.fit(xp_y)

xp_transformed = xp_label.transform(xp_y)

xp_inv_transformed = xp_label.inverse_transform(xp_transformed)

xp_label is defined above as
xp_label = LabelEncoder()

we are using it below to test fit_transform
xp_label.fit_transform(xp_y)

If we set it to xp_label again it would override the original estimator and set it to the fitted one.

Okay I think that this is already modified when we run fit.

@betatim Could you kindly check if this looks okay now?

Indeed it is used again but like you found out est.fit(X, y) modifies (and returns) est. The reason it works to reuse it later is that est.fit() and est.fit_transform() reset the state of the estimator before fitting/after calling fit() again the state is as if the first fit() call never happened (there are some exceptions with warm starting, but not here).

sklearn/preprocessing/tests/test_label.py

sklearn/utils/tests/test_array_api.py

…_api

betatim · 2024-05-16T13:47:39Z

Thanks a lot!

ENH Array API support for LabelEncoder

4869c0d

github-actions bot added module:preprocessing module:utils labels Sep 15, 2023

Add changelog

7fbd458

OmarManzoor marked this pull request as ready for review September 15, 2023 10:08

Add tests for array api functions

ec6ccc6

ogrisel reviewed Sep 21, 2023

View reviewed changes

sklearn/utils/_array_api.py Outdated Show resolved Hide resolved

sklearn/utils/tests/test_array_api.py Outdated Show resolved Hide resolved

ogrisel reviewed Sep 22, 2023

View reviewed changes

OmarManzoor added 2 commits September 23, 2023 11:57

Merge branch 'main' into label_encoder_array_api

a9d94ea

Updates: PR suggestions

43b039d

betatim mentioned this pull request Oct 24, 2023

Make more of the "tools" of scikit-learn Array API compatible #26024

Open

ogrisel reviewed Mar 28, 2024

View reviewed changes

OmarManzoor added 2 commits April 4, 2024 17:35

Merge branch 'main' into label_encoder_array_api

6198558

Fix dtype_name parameter

cfdabeb

betatim reviewed Apr 10, 2024

View reviewed changes

sklearn/utils/_array_api.py Show resolved Hide resolved

Omar Salman added 2 commits April 12, 2024 14:46

Merge branch 'main' into label_encoder_array_api

cfcabd2

Updates as suggested in review

23ee510

Omar Salman and others added 3 commits May 3, 2024 11:30

Merge branch 'main' into label_encoder_array_api

fa0e27c

Revert changes is estimator_checks

6177475

Improve the tests and handle device in _in1d

a21a490

ogrisel added 2 commits May 3, 2024 17:57

Fix missing device specification and explicit conversion to numpy

b09b57b

Fix _isin to work with Array API inputs

0544c32

ogrisel reviewed May 3, 2024

View reviewed changes

sklearn/utils/_encode.py Outdated Show resolved Hide resolved

ogrisel added the Array API label May 3, 2024

ogrisel and others added 4 commits May 3, 2024 18:27

Merge branch 'main' into label_encoder_array_api

7cbbc20

Fix the errors, make searchsorted a helper function

a34138b

Merge branch 'main' into label_encoder_array_api

58c5aa0

Add array_api_support tag

beb036a

ogrisel reviewed May 6, 2024

View reviewed changes

OmarManzoor and others added 4 commits May 7, 2024 11:18

Updates: according to some pr suggestions

34c2d92

Merge branch 'main' into label_encoder_array_api

bdb2d7e

Use xp.isdtype(values.dtype, "numeric") directly

db32acf

Update changelog

a593478

ogrisel approved these changes May 7, 2024

View reviewed changes

Update docstring for inverse transform

22fa611

ogrisel mentioned this pull request May 7, 2024

[WIP] Add array-api support to metrics.confusion_matrix #28867

Draft

1 task

betatim reviewed May 7, 2024

View reviewed changes

sklearn/preprocessing/_label.py Show resolved Hide resolved

OmarManzoor added 3 commits May 7, 2024 21:16

Change array-like to array

f814441

Merge branch 'main' into label_encoder_array_api

b5350ea

Update the changelog definition to make it consistent

8ce860d

OmarManzoor added 2 commits May 10, 2024 22:55

Revert and update parameter and return type names

fae25aa

Merge branch 'main' into label_encoder_array_api

e1bca48

betatim reviewed May 16, 2024

View reviewed changes

OmarManzoor added 3 commits May 16, 2024 14:04

Merge remote-tracking branch 'upstream/main' into label_encoder_array…

30f026b

…_api

Updates: Address further PR suggestions

dbf233a

Minor adjustment

7500c2f

betatim approved these changes May 16, 2024

View reviewed changes

betatim merged commit acd2d90 into scikit-learn:main May 16, 2024
30 checks passed

OmarManzoor deleted the label_encoder_array_api branch May 16, 2024 13:47

jeremiedbb mentioned this pull request May 20, 2024

Release 1.5.0 #29054

Merged

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH Array API support for LabelEncoder #27381

ENH Array API support for LabelEncoder #27381

OmarManzoor commented Sep 15, 2023

github-actions bot commented Sep 15, 2023 •

edited

ogrisel left a comment

ogrisel left a comment

ogrisel commented Mar 28, 2024 •

edited

OmarManzoor commented Mar 28, 2024

ogrisel left a comment

OmarManzoor commented Apr 18, 2024

OmarManzoor commented May 3, 2024 •

edited

ogrisel left a comment •

edited

ogrisel commented May 3, 2024

ogrisel left a comment

ogrisel left a comment

ogrisel commented May 7, 2024

OmarManzoor commented May 9, 2024

betatim left a comment •

edited

betatim May 16, 2024

OmarManzoor May 16, 2024

OmarManzoor May 16, 2024

OmarManzoor May 16, 2024

betatim May 16, 2024

betatim commented May 16, 2024

ENH Array API support for LabelEncoder #27381

ENH Array API support for LabelEncoder #27381

Conversation

OmarManzoor commented Sep 15, 2023

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

github-actions bot commented Sep 15, 2023 • edited

✔️ Linting Passed

ogrisel left a comment

Choose a reason for hiding this comment

ogrisel left a comment

Choose a reason for hiding this comment

ogrisel commented Mar 28, 2024 • edited

OmarManzoor commented Mar 28, 2024

ogrisel left a comment

Choose a reason for hiding this comment

OmarManzoor commented Apr 18, 2024

OmarManzoor commented May 3, 2024 • edited

ogrisel left a comment • edited

Choose a reason for hiding this comment

ogrisel commented May 3, 2024

ogrisel left a comment

Choose a reason for hiding this comment

ogrisel left a comment

Choose a reason for hiding this comment

ogrisel commented May 7, 2024

OmarManzoor commented May 9, 2024

betatim left a comment • edited

Choose a reason for hiding this comment

betatim May 16, 2024

Choose a reason for hiding this comment

OmarManzoor May 16, 2024

Choose a reason for hiding this comment

OmarManzoor May 16, 2024

Choose a reason for hiding this comment

OmarManzoor May 16, 2024

Choose a reason for hiding this comment

betatim May 16, 2024

Choose a reason for hiding this comment

betatim commented May 16, 2024

github-actions bot commented Sep 15, 2023 •

edited

ogrisel commented Mar 28, 2024 •

edited

OmarManzoor commented May 3, 2024 •

edited

ogrisel left a comment •

edited

betatim left a comment •

edited