Problem during creation of the virtual environment #3

DiegoBrambilla · 2020-05-09T19:29:35Z

Dear MEG team,
Cheers, this is Diego Brambilla, research fellow at MEG (yes, we are a neighbouring group!)
I would like to thank you for the hard work you have poured into the development of both MEAGRes2.0 and amrplusplus_v2.
I would like to report a problem I ran into when executing your pipeline in MARCONI, an HPC environment with SLURM as a job scheduler.

If I run main_AmrPlusPlus_v2_withRGI.nf and main_AmrPlusPlus_v2_withRGI_Kraken.nf, the pipeline fails.
Namely Processes based on RGI fail but they are ignored:

[d9/0b2330] NOTE: Process `RunDedupRGI (agogna)` terminated with an error exit status (1) -- Error is ignored
[ee/50b55e] NOTE: Process `RunRGI (agogna)` terminated with an error exit status (1) -- Error is ignored

On a closer look. the .command.err produced by each Run RGI process reports this error message:

Traceback (most recent call last):
  File "/usr/local/envs/AmrPlusPlus_env/bin/rgi", line 4, in <module>
    MainBase()
  File "/usr/local/envs/AmrPlusPlus_env/lib/python3.6/site-packages/app/MainBase.py", line 81, in __init__
    getattr(self, args.command)()
  File "/usr/local/envs/AmrPlusPlus_env/lib/python3.6/site-packages/app/MainBase.py", line 86, in main
    self.main_run(args)
  File "/usr/local/envs/AmrPlusPlus_env/lib/python3.6/site-packages/app/MainBase.py", line 120, in main_run
    rgi_obj.run()
  File "/usr/local/envs/AmrPlusPlus_env/lib/python3.6/site-packages/app/RGI.py", line 184, in run
    self.create_databases()
  File "/usr/local/envs/AmrPlusPlus_env/lib/python3.6/site-packages/app/RGI.py", line 178, in create_databases
    db_obj.build_databases()
  File "/usr/local/envs/AmrPlusPlus_env/lib/python3.6/site-packages/app/Database.py", line 22, in build_databases
    self.write_fasta_from_json()
  File "/usr/local/envs/AmrPlusPlus_env/lib/python3.6/site-packages/app/Database.py", line 68, in write_fasta_from_json
    with open(os.path.join(self.db, "proteindb.fsa"), 'w') as fout:
OSError: [Errno 30] Read-only file system: '/usr/local/envs/AmrPlusPlus_env/lib/python3.6/site-packages/app/_db/proteindb.fsa'

The final line makes me believe that these errors occur during the building of the nxf_container_env().
This is in line with issue #1 , namely this comment from @AroArz , in which it is rightly highlighted that:

problem comes from creating the virtual environment with RGI

Issue #1 has been close even though the evidence I here provide suggest the problem related to RGI still stands.
Consider re-opening issue #1 as you see fit.

Interestingly, [Errno 30] Read-only file system is an error instance that may happen during the creation of new environments, as you can see in this issue.

In addition, I report that the process FilteredKrakenResults fails because python cannot load numpy:

Command error:
  Traceback (most recent call last):
    File "/marconi_scratch/userexternal/afranzet/amrplusplus_v2/amrplusplus_v2/bin/kraken2_long_to_wide.py", line 5, in <module>
      import numpy as np
  ImportError: No module named 'numpy'

This should be related to the python version installed in the environment created by Singularity, and not the python in my host system.
Problems with loading python modules have been reported before with python 2 (like this one), but amrplusplus_v2 uses python 3.
Again, my guess is that this problem is linked to the setting of the environment. In fact, the previous problem involves files availability in /usr/local/envs/AmrPlusPlus_env/lib/python3.6/site-packages/app/
If you come to different conclusion, you may value opening a separate issue for this matter.

If you wish to integrally replicate my issue, you can retrieve the input data by runnug the attached script fastqdump.txt (you need to install sratools beforehand).

I have cloned and used the commit dc982589c30005ca01dbc847380d6d486523d72f from the master branch of your pipeline, here is the command line I have used:

for running main_AmrPlusPlus_v2_withRGI.nf :
$nextflow run main_AmrPlusPlus_v2_withRGI.nf -profile singularity --reads "data_LMFS/*_R{1,2}.fastq" --output "../output_LMFS" --work "../work_dir_LMFS"
for running main_AmrPlusPlus_v2_withRGI_Kraken.nf :
nextflow run main_AmrPlusPlus_v2_withRGI_Kraken.nf -profile singularity --reads "data_WARFARE/*_R{1,2}.fastq" --output "../output_WARFARE" --work "../work_dir_WARFARE"

Please tell me ifI can help you in any way, and please let me know.
Thanks for your time!

An off-topic question, if I may: the final output of main_AmrPlusPlus_v2, which are SamDedup_AMR_analytic_matrix.csv and AMR_analytic_matrix.csv , do not have resistance gene names in their annotation, but only the info on the 5 annotation levels plus the eventual label "RequiresSNPConfirmation". ONly the resistance mechanism is reported. How can I get the info of resistance gene names from amrplusplus_v2 output?

The text was updated successfully, but these errors were encountered:

meglab-metagenomics · 2020-05-21T23:12:32Z

megares_to_external_header_mappings_v2.00.zip

Hi Diego,

Nice to meet another MEG! Thanks for trying AMR++ and for your detailed post. I'll try to answer all of your questions.

However, could you please try the latest version of AMR++ v2.0.2 that we just uploaded? We were having some issues with RGI that we patched in this last version. I think we also addressed your issue with FilteredKrakenResults, so please let me know if you still have issues with that!

As for your last question, in MEGARes we moved away from the "gene name" and toward a gene accession that is more informative. The gene "Group" that we use is close to the gene name, but we go based on sequence similarity instead of the gene accession's given name in their original repository. You can read more information about gene names in this paper, Resistance Gene Naming and Numbering: Is It a New Gene or Not?. Still, if you want to track the original name of any of the MEGARes accessions, you can download the mapping file from the megares.meglab.org website. I'm also attaching it to this response for you.

Thanks, and let us know if you have any other questions.

Best,
Enrique

DiegoBrambilla · 2020-05-25T08:08:21Z

Hello,
Really appreciate your support, especially in these tough times.
Your explanation about the use of the gene accession number is food for thought, thanks.

I have read the latest changes.
I would suggest renaming update_details.md into a CHANGELOG.md, and also here you will find some good reasons for doing it.

I would like to try running main_AmrPlusPlus_v2_withRGI_Kraken.nf but I am required to use one of the databases provided by CARD.
As you can see from CARD website, they regularly release a plethora of files and I am not familiar enough with RGI to decide which will be the right one to use.
I didn't find any documentation referring to which CARD database is required for processes RunRGI and RunDedupRGI.
If you can tell me which file is required for card_db , I could prepare a script like download_minikraken.sh to download such file, and make a PR ( in dev branch).

I believe @AroArz will be glad to be notified about the latest news.

Keep up with the good work!

meglab-metagenomics · 2020-06-03T17:04:26Z

Hi Diego,

I'll respond here, close this issue and we can continue our discussion in the next issue you just created #4 .

Thanks for the tips! We are still new at managing software on github and appreciate the insight. We made those changes to the CHANGELOG.md file and we'll clarify which CARD database to use.

We used the latest CARD db, which was 3.0.8 at the time, based on the conversion in the RGI issue #93.

CARD also just released v3.0.9 and I imagine it would be best to keep up with the latest updates, but you can also download the old versions using this archive:
https://card.mcmaster.ca/download/0/broadstreet-v3.0.8.tar.bz2

We updated our README.md file to show both options. Thanks for your continuing support!

meglab-metagenomics closed this as completed Jun 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem during creation of the virtual environment #3

Problem during creation of the virtual environment #3

DiegoBrambilla commented May 9, 2020 •

edited

meglab-metagenomics commented May 21, 2020 •

edited

DiegoBrambilla commented May 25, 2020 •

edited

meglab-metagenomics commented Jun 3, 2020

Problem during creation of the virtual environment #3

Problem during creation of the virtual environment #3

Comments

DiegoBrambilla commented May 9, 2020 • edited

meglab-metagenomics commented May 21, 2020 • edited

DiegoBrambilla commented May 25, 2020 • edited

meglab-metagenomics commented Jun 3, 2020

DiegoBrambilla commented May 9, 2020 •

edited

meglab-metagenomics commented May 21, 2020 •

edited

DiegoBrambilla commented May 25, 2020 •

edited