Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage of "private" Spark APIs #300

Open
ceedubs opened this issue May 31, 2018 · 5 comments
Open

Usage of "private" Spark APIs #300

ceedubs opened this issue May 31, 2018 · 5 comments

Comments

@ceedubs
Copy link
Contributor

ceedubs commented May 31, 2018

Frameless is depending on portions of Spark for which there is no binary compatibility commitment. For example, Frameless uses StaticInvoke, which is part of the org.apache.spark.sql.catalyst.expressions.objects package. If you look at the (bountiful) mima exclusions in Spark, the entire org.apache.spark.sql.catalyst package is not checked for binary compatibility.

I don't really consider this a bug with frameless, but I wanted to at least raise it as a concern as it recently bit us at work.

backstory for those who care

At work we use Databricks runtime 3.5. Databricks claims that this runtime uses Spark 2.2. However, we ran into a bewildering issue with a binary incompatibility between Frameless and the runtime Spark version (related to org.apache.spark.sql.catalyst.expressions.objects.StaticInvoke). After quite a bit of investigation, we realized that the Databricks runtime doesn't actually include Spark 2.2 proper, but a private fork of it that has some incompatible changes. It has a backported change from Spark 2.3 that is incompatible with Spark 2.2 (and the version of Frameless that is built against Spark 2.2). We can work around this particular issue by moving to Spark 2.3 and the Databricks 4.0 runtime, but it's tough to know what other incompatibilities could be lurking in the private forks, and I could envision other people running into similar issues (especially if they can't move to Spark 2.3).

@imarios
Copy link
Contributor

imarios commented Jun 1, 2018

Thank @ceedubs, I always worried a bit about this. Databricks and other fork's runtimes will be an issue and that's at least something we have to document.

I have not extensively analyzed if we absolutely need to do this, but I am afraid that some of the encoding work we have require some APIs that are not exposed as public by core Spark.

@OlivierBlanvillain
Copy link
Contributor

I had no idea about these mima exclusions, it's really unfortunate... Anything actionable on our side?

@ceedubs
Copy link
Contributor Author

ceedubs commented Jun 4, 2018

@OlivierBlanvillain If there were ways to reduce dependencies on sections of code under these exclusions, that would be great. Short of that, the actionable item might be to just warn about this in the README. I'd be willing to contribute this documentation, but it might be a little while before I get to it.

@longcao
Copy link
Contributor

longcao commented Jun 6, 2018

Another anecdote: Technically speaking, I believe EMR Spark is also a fork as they've backported changes before (but much less often), so it's entirely possible that some bincompat issue can happen there as well.

@imarios
Copy link
Contributor

imarios commented Jun 8, 2018

@ceedubs @longcao these are great "warnings" to add. I will create a PR with the edits. If @ceedubs get to this faster even better! :)

chris-twiner added a commit to chris-twiner/frameless that referenced this issue Mar 8, 2024
chris-twiner added a commit to chris-twiner/frameless that referenced this issue Mar 8, 2024
chris-twiner added a commit to chris-twiner/frameless that referenced this issue Mar 8, 2024
…se rc1, so1 not a default repo it seems
chris-twiner added a commit to chris-twiner/frameless that referenced this issue Mar 8, 2024
chris-twiner added a commit to chris-twiner/frameless that referenced this issue Mar 8, 2024
chris-twiner added a commit to chris-twiner/frameless that referenced this issue Mar 8, 2024
chris-twiner added a commit to chris-twiner/frameless that referenced this issue Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants