Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for modin to improve apply operation performance #115

Open
Beforerr opened this issue Sep 20, 2023 · 4 comments
Open

Add support for modin to improve apply operation performance #115

Beforerr opened this issue Sep 20, 2023 · 4 comments

Comments

@Beforerr
Copy link

I think for most part we can just replace

import pandas as pd

with

# Conditional import
try:
    import modin.pandas as pd
    from modin.pandas import DataFrame, Series
except ImportError:
    import pandas as pd
    from pandas import DataFrame, Series
@shaypal5
Copy link
Collaborator

shaypal5 commented Sep 21, 2023

Ummm. I don't think that's something that should be done by default, with no way to control it with configuration.

Are we to assume anyone that installed modin prefers to have it replace pandas anytime, anywhere? I don't know. Maybe.

Do you have any reference to other similar libraries built on top of pandas and making the same change?

@Beforerr
Copy link
Author

One library I know is swifter (See jmcarpenter2/swifter/issues/93). But you are definitely right, we should have a way to control it with configuration.

@Beforerr
Copy link
Author

Beforerr commented Sep 21, 2023

Or we may do in another way, by not raising TypeError in _transform, and everything hopefully should work smoothly.

try:
    import modin.pandas as mpd
except ImportError:
  pass 

class ApplyToRows(pdp.ApplyToRows):
    ...
    def _transform(self, X, verbose):
        ...
        if  isinstance(new_cols, pd.Series) or isinstance(new_cols, mpd.Series):
            ...
        if isinstance(new_cols, pd.DataFrame) or isinstance(new_cols, mpd.DataFrame):
            ...

@shaypal5
Copy link
Collaborator

I still think there should be a way to opt-in, not opt-out, into this change.

And then maybe it could be applied everywhere.

Look at the way libraries do it with global flags which are read once and can be set on import, or by configuration files an environment variables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants