Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] ArrayRDD to Pyspark Dataframe? #82

Open
osimpson opened this issue Dec 4, 2017 · 1 comment
Open

[Question] ArrayRDD to Pyspark Dataframe? #82

osimpson opened this issue Dec 4, 2017 · 1 comment

Comments

@osimpson
Copy link

osimpson commented Dec 4, 2017

Hi - thanks so much for this package!

I came to this repo because I need to run a scikit-learn predictive model on Spark. It is easy to map the model with ArrayRDDs. However, my postprocessing assumes a PySpark DataFrame. Is there a way to convert an ArrayRDD to a DataFrame?

I appreciate any help, thanks!

@sajjadGG
Copy link

sajjadGG commented Jan 9, 2022

if its size isn't big I think you can collect it and convert it to a list and then create your data frame. (you can call collect or tolist on ArrayRDD) however I believe there are more efficient options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants