New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions #14953
Conversation
@@ -34,6 +34,11 @@ applyJavaNature( | |||
|
|||
description = "Apache Beam :: Runners :: Flink $flink_version" | |||
|
|||
dependencies { | |||
implementation group: 'com.fasterxml.jackson.module', name: 'jackson-module-jaxb-annotations', version: '2.12.3' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, thanks for contributing this fix! Which version of Flink (EMR) were you able to run with this fix (just out of curiosity).
Can you put this on the main dependencies block
beam/runners/flink/flink_runner.gradle
Line 176 in 92386d7
compile library.java.jackson_databind |
And can you use the default library definitions (and add the jsr310 one there).
jackson_jaxb_annotations : "com.fasterxml.jackson.module:jackson-module-jaxb-annotations:$jackson_version", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this on EMR 6.3.0 and 5.33
Now that I think more about this if the runner does not use at all these dependencies we probably should not add them. I wonder if these dependencies are missing from the EMR side (and we should document this) or if they are somehow misconfigured because of the classpath priorities being unaligned :S |
EMR does include jackson 2.9.10. I don't know if it is a version issue or a classpath issue but the only way I could get it to work was to include the jackson jars in the uber jar. |
What is the next step on this PR? |
I somehow forgot about this one. I still do not understand why the jackson dependencies that come from Also the requested update to use the default Beam version of jackson is missing, but that's minor, but good to align. |
Looks to me, that this is not Beam issue. Probably either in YARN or EMR (or combination). I think we should not add the dependencies. |
@anguillanneuf left some interesting comments on #15151.
|
I'd say we should investigate this to know the correct cause. FlinkRunner itself is not (as far as I was able to verify) declaring or importing the JAXB annotations. Also, it works on non EMR / Dataproc cases. Is it possible that this really relates to the examples only? Can the issue be there? |
Running into this on a non-example project on EMR as well (Beam 2.30, EMRv5.33/6.3). @iemejia do you know where the jackson 2.9 deps come from? I bundled the exact versions across the board, but if EMR is injecting a different one, that could be the problem. |
Beam's self-generated word count example fails on Dataproc==2.0, Beam==2.31.0, Flink==1.12 without this dependency. But it works with Dataproc==1.5, Beam==2.29, and Flink==1.9. |
@anguillanneuf If you have access to the failing environment(s), could you try to narrow down the version that is failing? Maybe using the corner versions of Beam, Flink and Dataproc? |
I meant to try to eliminate the versions that work and that do not work. That would help a lot. |
Ah, ok. What is the main difference of Dataproc 1.5 to 2.0? I pretty much think that the issue is not Beam issue nor Flink issue. |
I can optionally install a Flink component when creating a Dataproc cluster, just like other Dataproc optional components. But I don't have great many details on Dataproc 1.5 vs. 2.0. You may be onto something. I tried the following while using Beam Flink compatibility to guide myself. The issues seem to concentrate in Dataproc 2.0.
|
From the stack trace,
it looks like Beam's beam/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java Line 25 in 243128a
|
It's not that simple unfortunately. While Beam depends on some Jackson artifacts, it does not and should not depend on the artifact |
Is it possibly the hadoop jar? Looks like it pulls in a shaded jackson
module, but that could very well be it.
…On Tue, Jul 27, 2021 at 6:13 PM Kyle Weaver ***@***.***> wrote:
From the stack trace,
at com.fasterxml.jackson.databind.ObjectMapper.findModules(ObjectMapper.java:1054)
at org.apache.beam.sdk.options.PipelineOptionsFactory.<clinit>(PipelineOptionsFactory.java:471)
it looks like Beam's PipelineOptionsFactory needs this dependency.
https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsFactory.java#L25
It's not that simple unfortunately. While Beam depends on some Jackson
artifacts, it does not and should not depend on the artifact
jackson-module-jaxb-annotations (containing package
com.fasterxml.jackson.module.jaxb). The problem is that somehow
com.fasterxml.jackson.module.jaxb.JaxbAnnotationModule is erroneously
being registered as a service provider
<https://docs.oracle.com/javase/8/docs/api/java/util/ServiceLoader.html?is-external=true>
for com.fasterxml.jackson.databind.Module (which is part of Jackson core,
which is a real Beam dependency). So our best guess so far is that the
JaxbAnnotationModule service is being registered by some dependency which
is common to Dataproc and EMR.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#14953 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKEQPHJVIY3MLBWVCJRGUTTZ5KRVANCNFSM46EVE7QA>
.
|
Would Beam be open to switching to |
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions. |
This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
Hi folks, I am currently working on enabling a feature that relies on a 2.0 Dataproc image (BEAM-13973). I am looking to enable Interactive Beam to have the capability of creating a Dataproc cluster and sending a Flink job to it. For such a job to run successfully though, the dependencies listed in this PR are necessary. For this feature, I am using a 2.0 image because the 1.5 Dataproc images all use Flink 1.9.3, and it appears that Flink 1.9 has been deprecated for nearly a year now. Would there be any other potential workarounds that we can add into Beam to have Flink work on Dataproc? Would it be suitable to add these dependencies for now and label them with a ticket addressing this behavior with Dataproc and EMR? Thanks in advance! |
@victorplusc can you please inspect the classpath of the job being submitted to Dataproc and see which dependency brings the |
Hi @je-ik, after some investigation through SSHing into my cluster nodes, it appears that these dependencies are being introduced by: |
Just to follow up - I did a test by building my Flink shadowJar without the |
I'm not sure if I understand correctly. I'll recap my understanding - the dependency that brings JAXB is |
I believe so. Though for this, the versioning does not seem to have much of an effect here, I was able to successfully execute a job on Dataproc with both versions. As this dependency is necessary for me to fully enable an automatic process to send Flink pipelines to Dataproc, without needing users to locally build the shadowJar with it included, would it be possible to include only the +Dagang Wei (@functicons), who helped me investigate the dependencies on the Dataproc side. |
jackson-module-jaxb-annotations is deprecated. https://github.com/FasterXML/jackson-module-jaxb-annotations
Looks like Dataproc is now on Jackson 2.10. https://cloud.google.com/dataproc/docs/release-notes#November_09_2020 So jackson-module-jaxb-annotations shouldn't be listed as a service provider at all. Dataproc (or whichever of its dependencies is responsible) should remove it. |
Alright, can we: I fully understand the need to run examples using recent runner. If we cannot simply fix Dataproc (and EMR), then this might be the way to go. Seems like adding jar with annotations should not break anything. |
@je-ik, I did some further investigation, and it seems that just deleting the jackson-module-jaxb-annotations jar is enough to successfully run Flink jobs on Dataproc without the dependencies being locally built. Since the dependency is deprecated, let's first see if this is something we can resolve just on the Dataproc side of things. For now, I made a workaround for my use-case by setting the image version used by Dataproc to be the default image, which gets updated as new images come out. If a future image contains the fix, we don't have to do anything. |
Fix for BEAM-10430