Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions #14953

Closed
wants to merge 2 commits into from

Conversation

tkram01
Copy link

@tkram01 tkram01 commented Jun 5, 2021

Fix for BEAM-10430

@@ -34,6 +34,11 @@ applyJavaNature(

description = "Apache Beam :: Runners :: Flink $flink_version"

dependencies {
implementation group: 'com.fasterxml.jackson.module', name: 'jackson-module-jaxb-annotations', version: '2.12.3'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, thanks for contributing this fix! Which version of Flink (EMR) were you able to run with this fix (just out of curiosity).

Can you put this on the main dependencies block

compile library.java.jackson_databind

And can you use the default library definitions (and add the jsr310 one there).

jackson_jaxb_annotations : "com.fasterxml.jackson.module:jackson-module-jaxb-annotations:$jackson_version",

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this on EMR 6.3.0 and 5.33

@iemejia iemejia changed the title jackson needed to run under EMR to avoid class not found exceptions [BEAM-10430] jackson needed to run under EMR to avoid class not found exceptions Jun 7, 2021
@iemejia
Copy link
Member

iemejia commented Jun 7, 2021

Now that I think more about this if the runner does not use at all these dependencies we probably should not add them. I wonder if these dependencies are missing from the EMR side (and we should document this) or if they are somehow misconfigured because of the classpath priorities being unaligned :S

@tkram01
Copy link
Author

tkram01 commented Jun 7, 2021

EMR does include jackson 2.9.10. I don't know if it is a version issue or a classpath issue but the only way I could get it to work was to include the jackson jars in the uber jar.

@aaltay
Copy link
Member

aaltay commented Jul 1, 2021

What is the next step on this PR?

@iemejia
Copy link
Member

iemejia commented Jul 8, 2021

I somehow forgot about this one. I still do not understand why the jackson dependencies that come from beam-java-sdk-core are not resolved here, and why they should be defined explicitly in the runner even if it is not using them. Maybe @je-ik or @dmvk can have an intuition on this, maybe it is because of some weird classloading detail on Flink?

Also the requested update to use the default Beam version of jackson is missing, but that's minor, but good to align.

@je-ik
Copy link
Contributor

je-ik commented Jul 9, 2021

Looks to me, that this is not Beam issue. Probably either in YARN or EMR (or combination). I think we should not add the dependencies.

@ibzib
Copy link
Contributor

ibzib commented Jul 9, 2021

@anguillanneuf left some interesting comments on #15151.

  1. The exception also happens on Dataproc, so it's not just EMR.
  2. The Spark runner includes the same dependencies, likely for the same reason.

@je-ik
Copy link
Contributor

je-ik commented Jul 9, 2021

@anguillanneuf left some interesting comments on #15151.

1. The exception also happens on Dataproc, so it's not just EMR.

2. The Spark runner includes the same dependencies, likely for the same reason.

I'd say we should investigate this to know the correct cause. FlinkRunner itself is not (as far as I was able to verify) declaring or importing the JAXB annotations. Also, it works on non EMR / Dataproc cases. Is it possible that this really relates to the examples only? Can the issue be there?

@zh4ngx
Copy link

zh4ngx commented Jul 21, 2021

Running into this on a non-example project on EMR as well (Beam 2.30, EMRv5.33/6.3). @iemejia do you know where the jackson 2.9 deps come from? I bundled the exact versions across the board, but if EMR is injecting a different one, that could be the problem.

@anguillanneuf
Copy link
Contributor

Beam's self-generated word count example fails on Dataproc==2.0, Beam==2.31.0, Flink==1.12 without this dependency. But it works with Dataproc==1.5, Beam==2.29, and Flink==1.9.

@je-ik
Copy link
Contributor

je-ik commented Jul 27, 2021

@anguillanneuf If you have access to the failing environment(s), could you try to narrow down the version that is failing? Maybe using the corner versions of Beam, Flink and Dataproc?

@anguillanneuf
Copy link
Contributor

anguillanneuf commented Jul 27, 2021

@je-ik Which one's version? Beam? Never heard of "corner version", what is that?


Dataproc has fixed Flink versions.
Dataproc 2.0 maps to Flink 1.12.
Dataproc 1.5 maps to Flink 1.9.

@je-ik
Copy link
Contributor

je-ik commented Jul 27, 2021

I meant to try to eliminate the versions that work and that do not work.
a) fix version of Beam and Flink and try Dataproc 1.5 and 2.0
b) fix version of Dataproc and Beam and try Flink 1.9 and 1.12
c) fix version of Flink and Dataproc and try Beam 2.29 and 2.31

That would help a lot.

@je-ik
Copy link
Contributor

je-ik commented Jul 27, 2021

Ah, ok. What is the main difference of Dataproc 1.5 to 2.0? I pretty much think that the issue is not Beam issue nor Flink issue.

@anguillanneuf
Copy link
Contributor

I can optionally install a Flink component when creating a Dataproc cluster, just like other Dataproc optional components. But I don't have great many details on Dataproc 1.5 vs. 2.0.

You may be onto something. I tried the following while using Beam Flink compatibility to guide myself. The issues seem to concentrate in Dataproc 2.0.

Dataproc Beam Flink I can try Worked
2.0 2.31 1.12 Yes No - missing dep
2.0 2.30 1.12 Yes No - missing dep
1.5 2.29 1.9 Yes Yes
1.5 2.28 1.9 Yes Yes
1.5 2.27 1.9 Yes Yes
1.5 2.26 1.9 Yes Yes

@anguillanneuf
Copy link
Contributor

From the stack trace,

at com.fasterxml.jackson.databind.ObjectMapper.findModules(ObjectMapper.java:1054)

at org.apache.beam.sdk.options.PipelineOptionsFactory.<clinit>(PipelineOptionsFactory.java:471)

it looks like Beam's PipelineOptionsFactory needs this dependency.

@ibzib
Copy link
Contributor

ibzib commented Jul 28, 2021

From the stack trace,

at com.fasterxml.jackson.databind.ObjectMapper.findModules(ObjectMapper.java:1054)

at org.apache.beam.sdk.options.PipelineOptionsFactory.<clinit>(PipelineOptionsFactory.java:471)

it looks like Beam's PipelineOptionsFactory needs this dependency.

It's not that simple unfortunately. While Beam depends on some Jackson artifacts, it does not and should not depend on the artifact jackson-module-jaxb-annotations (containing package com.fasterxml.jackson.module.jaxb). The problem is that somehow com.fasterxml.jackson.module.jaxb.JaxbAnnotationModule is erroneously being registered as a service provider for com.fasterxml.jackson.databind.Module (which is part of Jackson core, which is a real Beam dependency). So our best guess so far is that the JaxbAnnotationModule service is being registered by some dependency which is common to Dataproc and EMR.

@zh4ngx
Copy link

zh4ngx commented Jul 28, 2021 via email

@anguillanneuf
Copy link
Contributor

anguillanneuf commented Sep 1, 2021

Would Beam be open to switching to gson from jackson?
Here's a googleapis example where we made the change: googleapis/java-pubsublite-spark#25

@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2022

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Feb 5, 2022
@github-actions
Copy link
Contributor

This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Feb 12, 2022
@victorplusc
Copy link
Contributor

victorplusc commented Mar 8, 2022

Hi folks,

I am currently working on enabling a feature that relies on a 2.0 Dataproc image (BEAM-13973). I am looking to enable Interactive Beam to have the capability of creating a Dataproc cluster and sending a Flink job to it. For such a job to run successfully though, the dependencies listed in this PR are necessary. For this feature, I am using a 2.0 image because the 1.5 Dataproc images all use Flink 1.9.3, and it appears that Flink 1.9 has been deprecated for nearly a year now.

Would there be any other potential workarounds that we can add into Beam to have Flink work on Dataproc? Would it be suitable to add these dependencies for now and label them with a ticket addressing this behavior with Dataproc and EMR?

Thanks in advance!

@aaltay
Copy link
Member

aaltay commented Mar 9, 2022

/cc @KevinGG @ibzib

@je-ik
Copy link
Contributor

je-ik commented Mar 9, 2022

@victorplusc can you please inspect the classpath of the job being submitted to Dataproc and see which dependency brings the com.fasterxml.jackson.module.jaxb.JaxbAnnotationModule service provider into META-INF/services?

@victorplusc
Copy link
Contributor

victorplusc commented Mar 16, 2022

Hi @je-ik, after some investigation through SSHing into my cluster nodes, it appears that these dependencies are being introduced by: /usr/lib/hadoop-yarn/lib/jackson-module-jaxb-annotations-2.10.5.jar. I was not able to find a dependency using the missing datatype, so I tried downloading the other dependency under /usr/lib/hadoop-yarn/jackson-datatype-jsr310-2.12.3.jar on each of the nodes. I also copied over the existing dependency into /usr/lib/hadoop-yarn/jackson-module-jaxb-annotations-2.10.5.jar for all nodes, then started a new Yarn session, but that did not seem to resolve the issue.

@victorplusc
Copy link
Contributor

victorplusc commented Mar 18, 2022

Just to follow up - I did a test by building my Flink shadowJar without the implementation group: 'com.fasterxml.jackson.datatype', name: 'jackson-datatype-jsr310', version: '2.12.3' dependency and I was able to send a Flink pipeline successfully to Dataproc. If the dependency issue related to the jackson-module-jaxb-annotations-2.10.5.jar can be resolved, then 2.0 Dataproc images shouldn't have any other issues when it comes to running Flink.

@je-ik
Copy link
Contributor

je-ik commented Mar 21, 2022

I'm not sure if I understand correctly. I'll recap my understanding - the dependency that brings JAXB is hadoop-yarn, correct? If that is dependency of Dataproc, then it seems to me, that the missing dependency should be added there. Maybe it is a version clash? Looks like versions 2.10.5 and 2.12.3 are involved in this.

@victorplusc
Copy link
Contributor

I believe so. Though for this, the versioning does not seem to have much of an effect here, I was able to successfully execute a job on Dataproc with both versions.

As this dependency is necessary for me to fully enable an automatic process to send Flink pipelines to Dataproc, without needing users to locally build the shadowJar with it included, would it be possible to include only the jackson-module-jaxb-annotations dependency and have a Jira ticket with a to-do to remove it after it has been resolved on the Dataproc side? This way, we can guarantee that this dependency issue does not show up in a future version of Beam. Doing so will also make it possible for users to follow use-cases such as the content covered in the Dataproc Flink component documentation using a version of Flink that has not been deprecated on the Beam side (the working example uses a Dataproc 1.5 image and Flink 1.9, but we no longer support that Flink version). Additionally, it does not appear that the jackson-datatype-jsr310 dependency is needed for me to run Flink pipelines on Dataproc, so only adding the jaxb-annotations should suffice.

+Dagang Wei (@functicons), who helped me investigate the dependencies on the Dataproc side.

@ibzib
Copy link
Contributor

ibzib commented Mar 21, 2022

jackson-module-jaxb-annotations is deprecated. https://github.com/FasterXML/jackson-module-jaxb-annotations

NOTE: This module has become part of Jackson Base Modules repo. as of Jackson 2.9

This repo still exists to allow release of patch versions of older versions; it will be hidden (made private) in near future.

Looks like Dataproc is now on Jackson 2.10. https://cloud.google.com/dataproc/docs/release-notes#November_09_2020

So jackson-module-jaxb-annotations shouldn't be listed as a service provider at all. Dataproc (or whichever of its dependencies is responsible) should remove it.

@je-ik
Copy link
Contributor

je-ik commented Mar 22, 2022

Alright, can we:
a) create tracking issue to remove the conflicting dependency from Dataproc (probably hadoop-yarn somehow)?
b) add the jaxb annotations with a tracking Jira in Beam to remove it once the upstream Dataproc issue is resolved?

I fully understand the need to run examples using recent runner. If we cannot simply fix Dataproc (and EMR), then this might be the way to go. Seems like adding jar with annotations should not break anything.

@victorplusc
Copy link
Contributor

@je-ik, I did some further investigation, and it seems that just deleting the jackson-module-jaxb-annotations jar is enough to successfully run Flink jobs on Dataproc without the dependencies being locally built. Since the dependency is deprecated, let's first see if this is something we can resolve just on the Dataproc side of things.

For now, I made a workaround for my use-case by setting the image version used by Dataproc to be the default image, which gets updated as new images come out. If a future image contains the fix, we don't have to do anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants