Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError while running tests #7

Open
micdonato opened this issue Apr 7, 2021 · 14 comments
Open

KeyError while running tests #7

micdonato opened this issue Apr 7, 2021 · 14 comments

Comments

@micdonato
Copy link

Hi, I am trying to install haystack on our server and I am running into an error when running the tests:
The tests complete successfully but at the end I get this:

INFO  @ Wed, 07 Apr 2021 16:10:54:
	 Analyzing MA0724.1 from:/home/user/haystack_test_output/HAYSTACK_PIPELINE_RESULTS/HAYSTACK_MOTIFS/HAYSTACK_MOTIFS_on_K562/genes_lists/MA0724.1_motif_region_in_target.tss.bed
/home/user/.conda/envs/hotspots/lib/python2.7/site-packages/haystack/generate_tf_activity_plane.py:189:FutureWarning: read_table is deprecated, use read_csv instead, passing sep='\t'.
  mapped_genes = map(str.upper, list(pd.read_table(motif_gene_filename,keep_default_na=False,na_values='null').dropna()['Symbol'].values.astype(str)))
Traceback (most recent call last):
  File "/home/users/.conda/envs/hotspots/bin/haystack_tf_activity_plane", line 10, in <module>
    sys.exit(main())
  File "/home/user/.conda/envs/hotspots/lib/python2.7/site-packages/haystack/generate_tf_activity_plane.py", line 193, in main
    ds_values = zscore_series(gene_ranking.ix[mapped_genes, :].mean())
  File "/home/user/.conda/envs/hotspots/lib/python2.7/site-packages/pandas/core/indexing.py", line 120, in __getitem__
    return self._getitem_tuple(key)
  File "/home/user/.conda/envs/hotspots/lib/python2.7/site-packages/pandas/core/indexing.py", line 888, in _getitem_tuple
    retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
  File "/home/user/.conda/envs/hotspots/lib/python2.7/site-packages/pandas/core/indexing.py", line 1088, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "/home/user/.conda/envs/hotspots/lib/python2.7/site-packages/pandas/core/indexing.py", line 1205, in _getitem_iterable
    raise_missing=False)
  File "/home/user/.conda/envs/hotspots/lib/python2.7/site-packages/pandas/core/indexing.py", line 1161, in _get_listlike_indexer
    raise_missing=raise_missing)
  File "/home/user/.conda/envs/hotspots/lib/python2.7/site-packages/pandas/core/indexing.py", line 1252, in _validate_read_indexer
    raise KeyError("{} not in index".format(not_found))
KeyError: "['BAGE5', 'GRIK1-AS2'] not in index"
INFO  @ Wed, 07 Apr 2021 16:10:54:
	 Test completed successfully

Should I be worried?

@rfarouni
Copy link
Collaborator

An update to pandas is causing this. I am not sure if it is a cause of worry, but to be on the safe side, I would pin the version of pandas (and potentially other packages) to the ones here https://github.com/pinellolab/haystack_bio/blob/master/Dockerfile#L35. Alternatively, you can use the Docker container.

Rick,

@micdonato
Copy link
Author

Thanks, it makes sense! I think I will use the docker container but I was considering building a Singularity container and pinning pandas will help.

@rfarouni
Copy link
Collaborator

I have no experience building Singularity containers but I think it would be a great solution for people running the pipeline on HPC clusters. Maybe @lucapinello knows more about these types of containers. I'll ask him.

@micdonato
Copy link
Author

They work roughly the same as Docker containers, it's just a matter to create the right recipe for building them. Usually I install a package locally to see if I am able to build everything, before going the container way.

My reason to use Singularity is mostly the root/user issue for Docker and to deal with filesystem isolation, but there are other differences as well.

Thanks!

@lucapinello
Copy link
Contributor

lucapinello commented Apr 12, 2021 via email

@micdonato
Copy link
Author

Hi all, and thanks!

That is what I tried at first. Unfortunately, it seems that Singularity fails to actually build the image, as packages that should be installed are missing:

The command:
singularity run docker://pinellolab/haystack_bio haystack_pipeline data/data_h3k27ac_6cells/samples_names.txt hg19 --blacklist hg19

The result:

INFO:    Using cached SIF image
Traceback (most recent call last):
  File "/usr/local/bin/haystack_pipeline", line 6, in <module>
    from pkg_resources import load_entry_point
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2927, in <module>
    @_call_aside
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2913, in _call_aside
    f(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 2940, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 635, in _build_master
    ws.require(__requires__)
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 943, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 829, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'scipy>=1.0.0' distribution was not found and is required by haystack-bio

That is why I wanted to try rebuild the singularity image from scratch.

@lucapinello
Copy link
Contributor

I can reproduce your error and it seems there is no simple solution to directly use the docker with singularity. You may want to explore this tool to convert the docker image : docker2singularity.

I have tested the docker image on my machine and it is still working as expected but I understand that this may not be a viable option for you.

You can try to downgrade pandas in the conda environment you have created previously and if necessary also the other packages:

numpy==1.13.3
scipy==1.0.0
matplotlib==2.1.0
pandas==0.21.0
&& pip install
bx-python==0.7.3
Jinja2==2.9.6
tqdm==4.19.4
weblogo==3.5.0 \

@rfarouni do you have the bandwidth to pin pandas in the next few days in the bioconda package and resubmit it so we can fix this for other users trying the package through bioconda? Of course this will require to create a separate conda env just for haystack

@rfarouni
Copy link
Collaborator

@lucapinello I will look into this as soon as I can

@lucapinello
Copy link
Contributor

lucapinello commented Apr 14, 2021 via email

@rfarouni
Copy link
Collaborator

I found that the easiest way to deal with this error is to run conda install pandas==0.21 after running conda install haystack_bio. The test runs fine after that.

	 The expression values of the gene TEST1 are not present. Skipping it. 

WARNING @ Thu, 22 Apr 2021 18:10:50:
	 The expression values of the gene SCIP are not present. Skipping it. 

INFO  @ Thu, 22 Apr 2021 18:10:50:
	 Gene:POU3F1 TF z-score:0.73 Targets z-score:1.58  Correlation:0.48 

WARNING @ Thu, 22 Apr 2021 18:10:50:
	 The expression values of the gene TST-1 are not present. Skipping it. 

WARNING @ Thu, 22 Apr 2021 18:10:50:
	 The expression values of the gene OCT6 are not present. Skipping it. 

WARNING @ Thu, 22 Apr 2021 18:10:50:
	 The expression values of the gene OTF-6 are not present. Skipping it. 

WARNING @ Thu, 22 Apr 2021 18:10:50:
	 The expression values of the gene OTF6 are not present. Skipping it. 

WARNING @ Thu, 22 Apr 2021 18:10:50:
	 The expression values of the gene OCT-6 are not present. Skipping it. 

WARNING @ Thu, 22 Apr 2021 18:10:50:
	 The expression values of the gene TST1 are not present. Skipping it. 

INFO  @ Thu, 22 Apr 2021 18:10:50:
	 All done! Ciao! 

INFO  @ Thu, 22 Apr 2021 18:10:50:
	 Test completed successfully```

@lucapinello
Copy link
Contributor

lucapinello commented Apr 22, 2021 via email

@lucapinello
Copy link
Contributor

lucapinello commented Apr 22, 2021 via email

@rfarouni
Copy link
Collaborator

This seems to work as well

@lucapinello
Copy link
Contributor

lucapinello commented Apr 22, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants