Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop using Parallel for SparkFeatureUnion #69

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

taynaud
Copy link
Collaborator

@taynaud taynaud commented Aug 30, 2016

See https://issues.apache.org/jira/browse/SPARK-12717
The parameter is still here for the converted to_scikit() object

I think it explain the flappy test on my previous PR

See https://issues.apache.org/jira/browse/SPARK-12717
The parameter is still here for the converted to_scikit() object
@fulibacsi
Copy link
Contributor

Is this issue still present in Spark 2.0.0?

@taynaud
Copy link
Collaborator Author

taynaud commented Sep 5, 2016

I do not know, the issue appears randomly and I have not reproduced it on my cluster. I have add spark 2.0 to CI in #71 but as it is random, I do not know if it will allow to conclude.

I think this parallelization is not very usefull for a spark computation.

@kszucs
Copy link
Contributor

kszucs commented Nov 1, 2016

Without threading a pipeline steps will be executed sequentially. I think n_jobs make sense, multiple dags will be submitted and executed in parallel. The overall level of parallelization can be increased via n_jobs.

Shouldn't we drop support for spark versions before 2.0.0?

@taynaud
Copy link
Collaborator Author

taynaud commented Dec 20, 2016

According to apache jira, it is still an issue in pyspark 2.0.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants