Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark 2.X.X support? #234

Open
SemyonSinchenko opened this issue Jan 29, 2022 · 2 comments
Open

Spark 2.X.X support? #234

SemyonSinchenko opened this issue Jan 29, 2022 · 2 comments
Labels
Type: Question ❔ Question about implementation or some technical aspect

Comments

@SemyonSinchenko
Copy link

Question

Is there support of the 2.X.X versions of Apache Spark?

Further Information

I see in pyproject.toml pyspark 3.2.0 dependency. But in real enerprise and on-premise clusters typically version is 2.X.X. Is there support of any Spark version except 3.2.0?

Screenshots

If applicable, add screenshots to help explain your question.

System Information

  • OS: RHEL
  • OS Version: 8
  • Language Version: 3.7
  • Package Manager Version: PIP

Additional Context

It is good to see the list of supported Spark/Besm versions but I couldn't find it. Maybe there is one? In that case could you please get me a link? Thank you!

@SemyonSinchenko SemyonSinchenko added the Type: Question ❔ Question about implementation or some technical aspect label Jan 29, 2022
@dvadym
Copy link
Collaborator

dvadym commented Jan 29, 2022

We haven't tested yet on 2.X, though I think it should be easy to make support 2.X (or even it might work with 2.X out of the box). That's because PipelineDP needs only some basic APIs from RDD (no yet support of other Spark API as DataFrames) - like map, reduceByKey, join etc. You can see all used Spark API in SparkRDDBackend class. If you have any feedback on using Spark please LMK. Also if you test it with Spark 2.* please LMK results.

In the next release, we will remove limitation on 3.2.0.

@SemyonSinchenko
Copy link
Author

Thanks a lot for a such fast answer. I'll write a comment here about my tests on Spark 2.3.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Question ❔ Question about implementation or some technical aspect
Projects
None yet
Development

No branches or pull requests

2 participants