Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically move y to the same device and namespace as X #28668

Open
betatim opened this issue Mar 20, 2024 · 4 comments
Open

Automatically move y to the same device and namespace as X #28668

betatim opened this issue Mar 20, 2024 · 4 comments

Comments

@betatim
Copy link
Member

betatim commented Mar 20, 2024

(From #27800 (comment) by @ogrisel)

The proposal/idea is to allow y to not be on the same device (and namespace?) as X when using Array API inputs. Currently we require/assume that they are on the same device and namespace, it is a requirement. However pipelines can't modify y which means it is not possible to move from CPU to GPU as one of the steps of the pipeline, the whole pipeline has to stay on one device. The below details some example and code to motivate allowing X ad y being on different devices (and namespaces).


Suppose we have:

>>> import torch
>>> from sklearn import set_config
>>> from sklearn.datasets import make_regression
>>> from sklearn.linear_model import Ridge
>>> set_config(array_api_dispatch=True)
>>> X, y = make_regression(n_samples=int(1e5), n_features=int(1e3), random_state=0)
>>> X_torch_cuda = torch.tensor(X).to("cuda")
>>> y_torch_cuda = torch.tensor(y).to("cuda")

I did a quick benchmark with timeit on a host with a 32 cores CPU and an A100 GPU: we get a bit more than 10x speed-up (which is in the range of what I would have expected):

>>> %time Ridge(solver="svd").fit(X, y)
CPU times: user 1min 29s, sys: 1min 4s, total: 2min 34s
Wall time: 6.18 s
Ridge(solver='svd')
>>> %time Ridge(solver="svd").fit(X_torch_cuda, y_torch_cuda)
CPU times: user 398 ms, sys: 2.74 ms, total: 401 ms
Wall time: 402 ms
Ridge(solver='svd')

I also tried the following:

>>> Ridge(solver="svd").fit(X_torch_cuda, y)
Traceback (most recent call last):
  Cell In[36], line 1
    Ridge(solver="svd").fit(X_torch_cuda, y)
  File ~/code/scikit-learn/sklearn/base.py:1194 in wrapper
    return fit_method(estimator, *args, **kwargs)
  File ~/code/scikit-learn/sklearn/linear_model/_ridge.py:1197 in fit
    device_ = device(*input_arrays)
  File ~/code/scikit-learn/sklearn/utils/_array_api.py:104 in device
    raise ValueError("Input arrays use different devices.")
ValueError: Input arrays use different devices.

I think it might be reasonable to expect this pattern to fail in general. explicitly ask the the user to provide inputs with consistently allocated data buffers.

However, we might want to be more lenient for the particular case of y (and sample_weight) and change Ridge.fit to automatically move y to the same namespace and device as X:

y = xp.asarray(y, device=device(X))

The reason would be to improve the usability of the Array API for the following pipelines:

X_pandas_df, y_pandas_series = fetch_some_pandas_data()

pipeline = make_pipeline(
    some_column_transformer(),  # works on CPU on the input dataframe
    FunctionTransformer(func=lambda X: torch.tensor(X).to("float32").to("cuda")),
    Ridge(solver="svd"),
)
pipeline.fit(X_pandas_df, y_pandas_series)

The pipeline steps can only transform X and not y (or sample_weight). So it means that user would have to instead call:

pipeline.fit(X_pandas_df, torch.tensor(y_pandas_series).to("cuda"))

which feels a bit weird/cumbersome to me.

This might not be a big deal though and I don't want to delay this PR to get a consensus on this particular point: I think it's fine the way it is for now but we might want come back to the UX of pipelines with Array API steps later. I will open a dedicated issue.


If we take action there are a few estimators that need double checking. For example LinearDiscriminantAnalysis currently raises low level PyTorch exceptions when X and y are on different devices.

@github-actions github-actions bot added the Needs Triage Issue requires triage label Mar 20, 2024
@betatim
Copy link
Member Author

betatim commented Mar 20, 2024

After thinking about this for a bit I think I like the idea. It solves a problem for which we haven't found a good solution in a while now (how to move from one device to another). I think allowing y to "follow" X when needed is a good solution and somehow sensible.

So right now I'm +1

@betatim betatim removed the Needs Triage Issue requires triage label Mar 20, 2024
@betatim betatim changed the title Automatically move y to same device and namespace as X Automatically move y to the same device and namespace as X Mar 20, 2024
@ogrisel
Copy link
Member

ogrisel commented Mar 20, 2024

+1 in general, and even more so to make it possible to train classifier pipelines on a GPU with str object-typed class labels passed to the pipeline as we usually allow.

@ogrisel
Copy link
Member

ogrisel commented Mar 20, 2024

Note, on the implementation side, to move NumPy arrays and Python lists to a specific array namespace/device combo, xp.asarray(source, device=device) should work (as we currently do in ensure_commone_namespace_device).

But if the things to move are already already on another namespace/device combo, that will most likely fail in general, for instance with torch:

>>> import array_api_strict as xp_strict
>>> import array_api_compat.torch as xp_torch
>>> reference = xp_strict.ones(3)
>>> reference.device
CPU_DEVICE
>>> xp_strict.asarray(xp_torch.ones(3, device="mps"), device=reference.device)
Traceback (most recent call last):
  Cell In[49], line 1
    xp_strict.asarray(xp_torch.ones(3, device="mps"), device=reference.device)
  File ~/miniforge3/envs/dev/lib/python3.11/site-packages/array_api_strict/_creation_functions.py:91 in asarray
    res = np.array(obj, dtype=_np_dtype, copy=copy)
  File ~/miniforge3/envs/dev/lib/python3.11/site-packages/torch/_tensor.py:1030 in __array__
    return self.numpy()
TypeError: can't convert mps:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

In this case, we should instead use the new improvements made to the dlpack and Array API specs:

Starting Python array API standard v2023, a copy can be explicitly requested (or disabled) through the new copy argument of from_dlpack(). When a copy is made, the producer must set the DLPACK_FLAG_BITMASK_IS_COPIED bit flag. It is also possible to request cross-device copies through the new device argument, though the v2023 standard only mandates the support of kDLCPU.

https://github.com/dmlc/dlpack/blob/main/docs/source/python_spec.rst

But note that there is no guarantee that cross-device transfers work in general, so we might always have to move through a CPU device if needed.

Furthermore, this is not yet implemented in array-api-compat/array-api-strict:

data-apis/array-api-compat#86 (comment)

In the mean time we will have to rely on our _convert_to_numpy hack before calling back to xp.asarray if we need to do cross-non-numpy-namespaces transfers.

For now we can decide to implement the "auto-follow" thing only when X is a namespace and y and sample_weight are numpy or other non-namespace containers.

@lorentzenchr
Copy link
Member

A priori, I‘m +1 of only allowing X and y on the same device.

I‘m also +1 to allow y on a different device and scikit-learn moving it to the device of X. But only if it is clearly documented. We could even think to start with a warning (that we can remove later.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants