Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I'm failing to use this package #2225

Open
Hoeze opened this issue Oct 10, 2019 · 11 comments
Open

I'm failing to use this package #2225

Hoeze opened this issue Oct 10, 2019 · 11 comments

Comments

@Hoeze
Copy link

Hoeze commented Oct 10, 2019

Hi, I try to install this package and use it but I'm constantly failing.
What I did so far:

  1. set up Anaconda with PySpark 2.4.4
  2. pip install bdgenomics.adam
  3. pyadam
['/opt/anaconda/envs/adam/bin/..', '/opt/anaconda/envs/adam/lib/python3.7/site-packages/bdgenomics/adam']
['/opt/anaconda/envs/adam/bin/..', '/opt/anaconda/envs/adam/lib/python3.7/site-packages/bdgenomics/adam']
ls: cannot access /opt/anaconda/envs/adam/lib/python3.7/site-packages/bdgenomics/adam/adam-python/dist: No such file or directory
Failed to find ADAM egg in /opt/anaconda/envs/adam/lib/python3.7/site-packages/bdgenomics/adam/adam-python/dist.
You need to build ADAM before running this program.

When I try to use the Python API I get the following result:

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('abc').getOrCreate()
from bdgenomics.adam.adamContext import ADAMContext
ac = ADAMContext(spark)

Traceback (most recent call last):
  File "/opt/anaconda/envs/adam/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3326, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-5-ade8e609ddf7>", line 5, in <module>
    ac = ADAMContext(spark)
  File "/opt/anaconda/envs/adam/lib/python3.7/site-packages/bdgenomics/adam/adamContext.py", line 57, in __init__
    c = self._jvm.org.bdgenomics.adam.rdd.ADAMContext.ADAMContextFromSession(ss._jsparkSession)
TypeError: 'JavaPackage' object is not callable
@heuermh
Copy link
Member

heuermh commented Oct 10, 2019

Hello @Hoeze! Thank you for submitting this issue.

I assume this is recent, with the most recent release version 0.29.0 of bdgenomics.adam ?

@akmorrow13 Could you also take a look?

@Hoeze
Copy link
Author

Hoeze commented Oct 10, 2019

Wow, thanks for the quick reaction 👍
Yes, it's version 0.29

@heuermh
Copy link
Member

heuermh commented Oct 10, 2019

I still need to look into this further, which may be difficult as campus is without power at the moment. Meanwhile, one of the following might work

Try the most recent development version on PyPI, 0.30.0a0

$ pip install bdgenomics.adam==0.30.0a0

https://pypi.org/project/bdgenomics.adam/0.30.0a0/

Try installing ADAM via Bioconda

$ conda install adam

https://bioconda.github.io/recipes/adam/README.html

I believe that a link is not created for pyadam though, so you may need to go looking for where it was installed.

@Hoeze
Copy link
Author

Hoeze commented Oct 10, 2019

I began installing with conda install adam, but this resulted in ModuleNotFoundError: bdgenomics.adam not found.
Thats why I removed it again and installed the pip version.

EDIT:
Installing both conda install adam and pip install bdgenomics.adam==0.30.0a0
the same time still results in a non-working pyadam.

@heuermh
Copy link
Member

heuermh commented Oct 11, 2019

Sorry, I personally don't have that much experience packaging python, and to be honest I get quite confused with regards to conda and pip and virtualenv and such.

I created the conda recipe to install the ADAM command line tools adam-submit and adam-shell from the Maven release distribution tarball, and that works fine. There is still an open issue regarding symlinks due to how conda moves things around when installing (#1973), which requires a patch (https://github.com/bioconda/bioconda-recipes/blob/master/recipes/adam/adam-submit.patch).

I am also not sure pyadam works correctly (#1973) after this commit to support installation via pip (eba275b).

I've been thinking it might be worth splitting the R and python libraries into separate repositories, so that they can each have their own language-specific build, release process, and packaging. As it is now, our release script doesn't work all the way through, I have to run the JVM and R parts of it peacemeal, and ask @akmorrow13 to build and deploy the python library (https://github.com/bigdatagenomics/adam/blob/master/scripts/release/release.sh#L60).

It might also be worth creating a separate conda recipe that depends on the PyPI package rather than the Maven release distribution tarball, although I don't know what this should be called (python-adam, bdgenomics.adam?), see

https://bioconda.github.io/contributor/guidelines.html#python

@heuermh
Copy link
Member

heuermh commented Oct 11, 2019

After a bit of experimenting, removing the egg-related lines from pyadam may work with the pip-installed version

$ diff pyadam pyadam2
25d24
< ADAM_EGG=$(${SOURCE_DIR}/find-adam-egg.sh)
36d34
<     --py-files ${ADAM_EGG} \

$ ./pyadam2
Using PYSPARK=/usr/local/bin/pyspark
Python 2.7.16 (default, Sep  2 2019, 11:59:44)
[GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
2019-10-11 11:25:11 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.4
      /_/

Using Python version 2.7.16 (default, Sep  2 2019 11:59:44)
SparkSession available as 'spark'.
>>>

@heuermh
Copy link
Member

heuermh commented Jan 6, 2020

Hello @akmorrow13, might you be able to weigh in on this issue?

@akmorrow13
Copy link
Contributor

@heuermh I am unsure about how/if the pip installations are dealing with the jars, that was setup before my time. I am unsure of whether pip is installing the jars at all? I can update the conda recipe to work more like Mangos, as the conda recipe currently does not install python modules as all. However, I will have to think more about pip.

@heuermh
Copy link
Member

heuermh commented Jan 6, 2020

Thanks for the quick reply! Before updating the Conda recipe, I would like to try to resolve the problem(s) with the pyadam script. I think #2041 might be a similar issue. Is there any reason to keep this script around? Is there any reason to continue supporting pip?

@alartin
Copy link

alartin commented Nov 7, 2022

Hi all,
Adam 1.0 release has same issue. I have to manually comment two lines of setting and finding eggs files in pyadam script. Could you please help on the consequences of commenting out these two lines? If everything goes well with that, why were they included in the previous release and what the purposes were? Thanks!

@heuermh
Copy link
Member

heuermh commented Nov 9, 2022

Hello @alartin, unfortunately I am not much more informed on python packaging than in 2019 😉

If you have a workaround that works for you, keep going with it! I'm typically using the adam python libraries from a jupyter or quarto notebook and thus don't use the pyadam shell script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants