Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3a URL not working #2096

Open
dekinsitro opened this issue Dec 5, 2018 · 5 comments
Open

s3a URL not working #2096

dekinsitro opened this issue Dec 5, 2018 · 5 comments

Comments

@dekinsitro
Copy link

I am trying to follow the documentation to allow ADAM to read a BAM file from S3.
According to https://adam.readthedocs.io/en/latest/deploying/aws/#input-and-output-data-on-hdfs-and-s3 I should run a command like this:
adam-submit --packages com.amazonaws:aws-java-sdk-pom:1.11.463,net.fnothaft:jsr203-s3a:0.0.1 -- transformAlignments s3a://1000genomes/phase1/data/NA12878/exome_alignment/NA12878.mapped.illumina.mosaik.CEU.exome.20110411.bam /mnt/test.adam

When I run that command, I get an error with many unresolved dependency jars:

:: problems summary ::
:::: WARNINGS
[NOT FOUND ] org.apache.commons#commons-math3;3.1.1!commons-math3.jar (0ms)

....
:::: WARNINGS
[NOT FOUND ] org.apache.commons#commons-math3;3.1.1!commons-math3.jar (0ms)

    ==== local-m2-cache: tried

      file:/home/ubuntu/.m2/repository/org/apache/commons/commons-math3/3.1.1/commons-math3-3.1.1.jar

            [NOT FOUND  ] commons-collections#commons-collections;3.2.1!commons-collections.jar (0ms)

It's not clear to me (I don't work with Java much) what is going on, but my guess is that the tool that should be downloading package dependencies doesn't run, and it's just looking for cached data in the maven cache.

@heuermh
Copy link
Member

heuermh commented Dec 5, 2018

Hello @dekinsitro, thank you for submitting this issue.

The docs suggest including org.apache.hadoop:hadoop-aws:2.7.4, so you may want to try

adam-submit \
  --packages com.amazonaws:aws-java-sdk-pom:1.11.463,org.apache.hadoop:hadoop-aws:2.7.4,net.fnothaft:jsr203-s3a:0.0.1 \
  -- \
  transformAlignments \
  s3a://1000genomes/phase1/data/NA12878/exome_alignment/NA12878.mapped.illumina.mosaik.CEU.exome.20110411.bam \
  /mnt/test.adam

Are you running Spark on AWS, perhaps via EMR?

@dekinsitro
Copy link
Author

I'm running on a simple Ubuntu 18.04 EC2 VM, not EMR. Spark/EMR on AWS already includes the necessary s3 connector jars.

Using your command changes the error, but still roughly the same problem:
adam-submit
--packages com.amazonaws:aws-java-sdk-pom:1.11.463,org.apache.hadoop:hadoop-aws:2.7.4,net.fnothaft:jsr203-s3a:0.0.1
--
transformAlignments
s3a://1000genomes/phase1/data/NA12878/exome_alignment/NA12878.mapped.illumina.mosaik.CEU.exome.20110411.bam
/mnt/test.adam

produces:

::::::::::::::::::::::::::::::::::::::::::::::

            ::              FAILED DOWNLOADS            ::

            :: ^ see resolution messages for details  ^ ::

            ::::::::::::::::::::::::::::::::::::::::::::::

            :: com.google.code.findbugs#jsr305;3.0.0!jsr305.jar

            :: org.apache.commons#commons-math3;3.1.1!commons-math3.jar

            :: com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle)

            :: org.codehaus.jettison#jettison;1.1!jettison.jar(bundle)

            :: com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar

            :: org.codehaus.jackson#jackson-jaxrs;1.9.13!jackson-jaxrs.jar

            :: org.codehaus.jackson#jackson-xc;1.9.13!jackson-xc.jar

            :: com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle)

            :: org.tukaani#xz;1.0!xz.jar

            :: jline#jline;0.9.94!jline.jar

            ::::::::::::::::::::::::::::::::::::::::::::::

I don't see any indication the packages are even being attempted to download, just looking for them in the cache.

@heuermh
Copy link
Member

heuermh commented Dec 5, 2018

Right, things can be a little bit different depending on the Spark installation.

For example, for me on Cloudera CDH only the jsr203-s3a is necessary

$ export AWS_SECRET_ACCESS_KEY=...
$ export AWS_ACCESS_KEY_ID=...
$ adam-submit --packages net.fnothaft:jsr203-s3a:0.0.1 ...

I don't know why your version of Spark isn't trying to download the necessary dependencies, perhaps there are some network or ivy settings issues?

Another option would be to pull the dependencies into your local ivy cache using ivy directly

$ ivy -dependency com.google.code.findbugs jsr305 3.0.0
:: loading settings :: url = jar:file:/usr/local/Cellar/ivy/2.4.0/libexec/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
:: resolving dependencies :: com.google.code.findbugs#jsr305-caller;working
	confs: [default]
	found com.google.code.findbugs#jsr305;3.0.0 in public
downloading https://repo1.maven.org/maven2/com/google/code/findbugs/jsr305/3.0.0/jsr305-3.0.0.jar ...
......... (19kB)
.. (0kB)
	[SUCCESSFUL ] com.google.code.findbugs#jsr305;3.0.0!jsr305.jar (73ms)
downloading https://repo1.maven.org/maven2/com/google/code/findbugs/jsr305/3.0.0/jsr305-3.0.0-sources.jar ...
........ (16kB)
.. (0kB)
	[SUCCESSFUL ] com.google.code.findbugs#jsr305;3.0.0!jsr305.jar(source) (59ms)
downloading https://repo1.maven.org/maven2/com/google/code/findbugs/jsr305/3.0.0/jsr305-3.0.0-javadoc.jar ...
...................... (173kB)
.. (0kB)
	[SUCCESSFUL ] com.google.code.findbugs#jsr305;3.0.0!jsr305.jar(javadoc) (88ms)
:: resolution report :: resolve 909ms :: artifacts dl 224ms
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   1   |   1   |   1   |   0   ||   3   |   3   |
	---------------------------------------------------------------------

I'll try hopping on an Ubuntu EC2 instance tomorrow to see if I can replicate your issue.

@dekinsitro
Copy link
Author

Interesting suggestion. Please do try to reproduce this problem with a modern (18.04 Ubuntu) VM if possible. I'm basically doing either "conda install -c conda-forge adam" or "pip install bdgenomics.adam"
then trying to run a basic transformAlignments on an s3-sourced file

@heuermh
Copy link
Member

heuermh commented May 23, 2019

Sorry for dropping this for a while, I'll try to replicate this later this week with the new 0.27.0 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants