Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Pyspark dataframe #1103

Open
rishabh-dream11 opened this issue Apr 11, 2024 · 3 comments
Open

Support for Pyspark dataframe #1103

rishabh-dream11 opened this issue Apr 11, 2024 · 3 comments

Comments

@rishabh-dream11
Copy link

馃殌 The feature

Pyspark is used widely in the community for ETL work involving large datasets.
Adding support for it will increase adoption for the product.

Motivation, pitch

My org uses, Pyspark as the only framework for ETL, EDA is done by visualising various cuts of the same pyspark dataframe.

Alternatives

No response

Additional context

No response

@gventuri
Copy link
Collaborator

gventuri commented Apr 13, 2024

This would be an interesting addition. Not sure about how easy it would be to add support for pyspark in the current setup, but it's definitely worth exploring. So you would like to use pyspark as an engine if I understand correctly. Or you just want to be able to provide a spark dataframe as an input?

@rishabh-dream11
Copy link
Author

Pyspark engine and that has to support spark dataframe as input.

@rishabh-dream11
Copy link
Author

@gventuri Is there any progress/discussion on this issue? Will this be considered for future releases?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants