Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.4 AgnosticEncoder support - Spark Connect #701

Open
chris-twiner opened this issue Apr 11, 2023 · 1 comment
Open

3.4 AgnosticEncoder support - Spark Connect #701

chris-twiner opened this issue Apr 11, 2023 · 1 comment

Comments

@chris-twiner
Copy link
Contributor

Spark Connect adds another interaction approach, splitting driver and executor. The remote app using spark does so over Arrow instead of InternalRow / Expressions per SPARK-41690. It's definitely not a must for 3.4 support (as shown by #698) but it would require a large chunk of TypedEncoder , TypedDataset etc. to be either largely dupicated or have some clever implicit strategy pushed through and left to only the TypedEncoder.scala to be largely duped.

Place holder issue.

@chris-twiner
Copy link
Contributor Author

Based on the source code the ConvertToArrow (called by connects' SparkSession) and ExpressionEncoder only support the built in AgnosticEncoders (logic present in ScalaReflection) - it's a locked in system, no way to inject behaviour without classpath hackery.

So custom types, typed datasets (different api ) and injections - i.e. all the cool stuff - don't seem to be possible with Spark Connect as it stands in 3.4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants