Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error ir mapping step #142

Open
chodarq opened this issue Dec 7, 2020 · 7 comments
Open

Error ir mapping step #142

chodarq opened this issue Dec 7, 2020 · 7 comments

Comments

@chodarq
Copy link

chodarq commented Dec 7, 2020

Hi all.
I am trying to test ngless with the examples for ocean's samples using this instruction:
ngless --threads=40 --index-path 'index' -t 'temp' ocean-demo.ngl . out
but I have the following error:
[Mon 07-12-2020 12:11]: Script OK. Starting interpretation...
[Mon 07-12-2020 12:11] Line 7: lock1: Obtained lock file: 'ngless-locks/14587ba1/SAMEA2621033.sampled.lock'
[Mon 07-12-2020 12:11] Line 7: Writing stats to 'ngless-stats/14587ba1/SAMEA2621033.sampled'
[Mon 07-12-2020 12:11] Line 8: load_mocat_sample found single-end sample 'SAMEA2621033.sampled/ERR594391_1.fastq.gz.short.fq.gz'
[Mon 07-12-2020 12:11] Line 8: load_mocat_sample found single-end sample 'SAMEA2621033.sampled/ERR594391_2.fastq.gz.short.fq.gz'
[Mon 07-12-2020 12:11] Line 14: Start BWA index creation for index/home/cstuardo/.local/share/ngless/data/Modules/om-rgc.ngm/1.0/data/OM-RGC.fna.gz
[Mon 07-12-2020 12:11] Line 14: Success
[Mon 07-12-2020 12:11] Line 14: Starting mapping to index/home/cstuardo/.local/share/ngless/data/Modules/om-rgc.ngm/1.0/data/OM-RGC.fna.gz
Exiting after fatal error:
An unhandled erorr occurred (this should not happen)!

    If you can reproduce this issue, please run your script
    with the --trace flag and report a bug (including the script and the trace) at
            https://github.com/ngless-toolkit/ngless/issues

The error message was: fd:169: hPutBuf: resource vanished (Broken pipe)

Trace output says:
[Mon 07-12-2020 14:21:42]: Script OK. Starting interpretation...
[Mon 07-12-2020 14:21:42] Line 16: Running garbage collection.
[Mon 07-12-2020 14:21:42] Line 16: Interpreting [interpretIO]: __check_count(__VOID; original_lno=16; features=["KEGG_ko","eggNOG_OG"]; normalization={scaled})
[Mon 07-12-2020 14:21:42] Line 6: Running garbage collection.
[Mon 07-12-2020 14:21:42] Line 6: Interpreting [interpretIO]: samples = readlines("tara.demo.sampled")
[Mon 07-12-2020 14:21:42] Line 6: Interpreting [assignment]: readlines("tara.demo.sampled")
[Mon 07-12-2020 14:21:42] Line 6: Interpreting [executing module function: 'readlines']: NGOString "tara.demo.sampled"
[Mon 07-12-2020 14:21:42] Line 7: Running garbage collection.
[Mon 07-12-2020 14:21:42] Line 7: Interpreting [interpretIO]: sample = lock1(Lookup 'samples' as NGList NGLString; __hash="14587ba198fb6d34e4a614110eb8ddb4")
[Mon 07-12-2020 14:21:42] Line 7: Interpreting [assignment]: lock1(Lookup 'samples' as NGList NGLString; __hash="14587ba198fb6d34e4a614110eb8ddb4")
[Mon 07-12-2020 14:21:42] Line 7: Interpreting [executing module function: 'lock1']: NGOList [NGOString "SAMEA2621033.sampled",NGOString "SAMEA2621155.sampled",NGOString "SAMEA2621229.sampled"]
[Mon 07-12-2020 14:21:42] Line 7: Looking for a lock in ngless-locks/14587ba1. Total number of elements is 3 (not locked: 3; not finished: 3).
[Mon 07-12-2020 14:21:42] Line 7: Acquired lock file ngless-locks/14587ba1/SAMEA2621033.sampled.lock
[Mon 07-12-2020 14:21:42] Line 7: lock1: Obtained lock file: 'ngless-locks/14587ba1/SAMEA2621033.sampled.lock'
[Mon 07-12-2020 14:21:42] Line 7: Writing stats to 'ngless-stats/14587ba1/SAMEA2621033.sampled'
[Mon 07-12-2020 14:21:42] Line 8: Running garbage collection.
[Mon 07-12-2020 14:21:42] Line 8: Interpreting [interpretIO]: input = load_mocat_sample(Lookup 'sample' as NGLString; __perform_qc=False)
[Mon 07-12-2020 14:21:42] Line 8: Interpreting [assignment]: load_mocat_sample(Lookup 'sample' as NGLString; __perform_qc=False)
[Mon 07-12-2020 14:21:42] Line 8: Interpreting [executing module function: 'load_mocat_sample']: NGOString "SAMEA2621033.sampled"
[Mon 07-12-2020 14:21:42] Line 8: Executing load_mocat_sample transform
[Mon 07-12-2020 14:21:42] Line 8: load_mocat_sample found single-end sample 'SAMEA2621033.sampled/ERR594391_1.fastq.gz.short.fq.gz'
[Mon 07-12-2020 14:21:42] Line 8: load_mocat_sample found single-end sample 'SAMEA2621033.sampled/ERR594391_2.fastq.gz.short.fq.gz'
[Mon 07-12-2020 14:21:42] Line 10: Running garbage collection.
[Mon 07-12-2020 14:21:42] Line 10: Interpreting [interpretIO]: input = preprocess(Lookup 'input' as NGLReadSet; __input_qc=True; keep_singles=False)using {Block {blockVariable = [Variable "read"], blockBody = Sequence [Optimized (SubstrimReassign (Variable "read") 25),Optimized (LenThresholdDiscard (Variable "read") BOpLT 45)]}}
[Mon 07-12-2020 14:21:42] Line 10: Interpreting [assignment]: preprocess(Lookup 'input' as NGLReadSet; __input_qc=True; keep_singles=False)using {Block {blockVariable = [Variable "read"], blockBody = Sequence [Optimized (SubstrimReassign (Variable "read") 25),Optimized (LenThresholdDiscard (Variable "read") BOpLT 45)]}}
[Mon 07-12-2020 14:21:42] Line 10: Created & opened temporary file temp/preprocessed.1..fq32490-0.gz
[Mon 07-12-2020 14:21:42] Line 10: Created & opened temporary file temp/preprocessed.2..fq32490-1.gz
[Mon 07-12-2020 14:21:42] Line 10: Created & opened temporary file temp/preprocessed.singles..fq32490-2.gz
[Mon 07-12-2020 14:21:44] Line 10: Simple Statistics completed for: SAMEA2621033.sampled/ERR594391_1.fastq.gz.short.fq.gz
[Mon 07-12-2020 14:21:44] Line 10: Number of base pairs: 101
[Mon 07-12-2020 14:21:44] Line 10: Encoding is: SangerEncoding
[Mon 07-12-2020 14:21:44] Line 10: Number of sequences: 250000
[Mon 07-12-2020 14:21:49] Line 10: Simple Statistics completed for: SAMEA2621033.sampled/ERR594391_2.fastq.gz.short.fq.gz
[Mon 07-12-2020 14:21:49] Line 10: Number of base pairs: 101
[Mon 07-12-2020 14:21:49] Line 10: Encoding is: SangerEncoding
[Mon 07-12-2020 14:21:49] Line 10: Number of sequences: 250000
[Mon 07-12-2020 14:21:54] Line 10: Preprocess finished
[Mon 07-12-2020 14:21:54] Line 10: Simple Statistics completed for: preproc.lno10.pairs.1
[Mon 07-12-2020 14:21:54] Line 10: Number of base pairs: 0
[Mon 07-12-2020 14:21:54] Line 10: Encoding is: SangerEncoding
[Mon 07-12-2020 14:21:54] Line 10: Number of sequences: 0
[Mon 07-12-2020 14:21:54] Line 10: Simple Statistics completed for: preproc.lno10.pairs.2
[Mon 07-12-2020 14:21:54] Line 10: Number of base pairs: 0
[Mon 07-12-2020 14:21:54] Line 10: Encoding is: SangerEncoding
[Mon 07-12-2020 14:21:54] Line 10: Number of sequences: 0
[Mon 07-12-2020 14:21:54] Line 10: Simple Statistics completed for: preproc.lno10.singles
[Mon 07-12-2020 14:21:54] Line 10: Number of base pairs: 101
[Mon 07-12-2020 14:21:54] Line 10: Encoding is: SangerEncoding
[Mon 07-12-2020 14:21:54] Line 10: Number of sequences: 368045
[Mon 07-12-2020 14:21:54] Line 14: Running garbage collection.
[Mon 07-12-2020 14:21:54] Line 14: Removing temporary file: temp/preprocessed.2..fq32490-1.gz
[Mon 07-12-2020 14:21:54] Line 14: Removing temporary file: temp/preprocessed.1..fq32490-0.gz
[Mon 07-12-2020 14:21:54] Line 14: Interpreting [interpretIO]: mapped = map(Lookup 'input' as NGLReadSet; reference="om-rgc"; mode_all=True)
[Mon 07-12-2020 14:21:54] Line 14: Interpreting [assignment]: map(Lookup 'input' as NGLReadSet; reference="om-rgc"; mode_all=True)
[Mon 07-12-2020 14:21:54] Line 14: Acquired lock file index/home/cstuardo/.local/share/ngless/data/Modules/om-rgc.ngm/1.0/data/OM-RGC.fna.gz.ngless-index.lock
[Mon 07-12-2020 14:21:54] Line 14: Start BWA index creation for index/home/cstuardo/.local/share/ngless/data/Modules/om-rgc.ngm/1.0/data/OM-RGC.fna.gz
[Mon 07-12-2020 14:21:54] Line 14: Will run process /home/cstuardo/.local/share/ngless/bin/ngless-1.1.0-bwaindex -b 42747494 -p index/home/cstuardo/.local/share/ngless/data/Modules/om-rgc.ngm/1.0/data/OM-RGC.fna-bwa-0.7.17.gz index/home/cstuardo/.local/share/ngless/data/Modules/om-rgc.ngm/1.0/data/OM-RGC.fna.gz
[Mon 07-12-2020 14:21:55] Line 14: Stderr: index: invalid option -- 'b'
[Mon 07-12-2020 14:21:55] Line 14: Stdout:
[Mon 07-12-2020 14:21:55] Line 14: Success
[Mon 07-12-2020 14:21:55] Line 14: Created & opened temporary file temp/mapped_OM-RGC.sam32490-3.zstd
[Mon 07-12-2020 14:21:55] Line 14: Starting mapping to index/home/cstuardo/.local/share/ngless/data/Modules/om-rgc.ngm/1.0/data/OM-RGC.fna.gz
[Mon 07-12-2020 14:21:55] Line 14: Will run process /home/cstuardo/.local/share/ngless/bin/ngless-1.1.0-bwamem -t 40 -K 100000000 -a index/home/cstuardo/.local/share/ngless/data/Modules/om-rgc.ngm/1.0/data/OM-RGC.fna-bwa-0.7.17.gz -p -
Exiting after fatal error:
An unhandled erorr occurred (this should not happen)!

    If you can reproduce this issue, please run your script
    with the --trace flag and report a bug (including the script and the trace) at
            https://github.com/ngless-toolkit/ngless/issues

The error message was: fd:169: hPutBuf: resource vanished (Broken pipe)

Any help would be great.

@luispedro
Copy link
Member

The error message is pretty awful, but I have seen this happen when BWA runs out of RAM and crashes in a bad way. The ocean demo requires a lot of memory.

I am leaving open for "bad error message", but I think the lack of RAM is the underlying problem. Please correct me if I'm wrong.

@chodarq
Copy link
Author

chodarq commented Dec 8, 2020

Thanks for your answer Luis.
When you say "lot of memory" do you have an idea of how many? I'm running the demo in a 1Tera RAM machine.
Also, a couple of hours ago I also try the demo in a 256Gb ram machine and until now is all running. I have the impression that the problem it's related with the conda environment (the first machine uses, but not the second).
I will keep you inform.
All the best,

@luispedro
Copy link
Member

Oh, that should have been more than enough.

@unode
Copy link
Member

unode commented Dec 8, 2020

[Mon 07-12-2020 14:21:54] Line 14: Will run process /home/cstuardo/.local/share/ngless/bin/ngless-1.1.0-bwaindex -b 42747494 -p index/home/cstuardo/.local/share/ngless/data/Modules/om-rgc.ngm/1.0/data/OM-RGC.fna-bwa-0.7.17.gz index/home/cstuardo/.local/share/ngless/data/Modules/om-rgc.ngm/1.0/data/OM-RGC.fna.gz
[Mon 07-12-2020 14:21:55] Line 14: Stderr: index: invalid option -- 'b'
[Mon 07-12-2020 14:21:55] Line 14: Stdout:
[Mon 07-12-2020 14:21:55] Line 14: Success

These lines are a bit puzzling. There's an invalid option message but the index step still succeeds.
Wondering if the index is somehow corrupted and causes the issue downstream.

@chodarq
Copy link
Author

chodarq commented Dec 8, 2020

Hi Renato,
I already note that. But in the second machine, indexing step is still running (11hrs now) with the instruction wihout error in the -b option:
[Mon 07-12-2020 22:04:53] Line 14: Will run process /home/chodar/.local/share/ngless/bin/ngless-1.1.1-bwaindex -b 883332754 -p index/home/chodar/.local/share/ngless/data/Modules/om-rgc.ngm/1.0/data/OM-RGC.fna-bwa-0.7.17.gz index/home/chodar/.local/share/ngless/data/Modules/om-rgc.ngm/1.0/data/OM-RGC.fna.gz

I will check the folderto search the existence of indexing files. I also note that the conda environment there is a difference: ngless-1.1.0-bwaindex vs ngless-1.1.1-bwaindex in the second machine (without conda environment).

@unode
Copy link
Member

unode commented Dec 8, 2020

Indexing a resource as large as OM-RGC is unfortunately quite time consuming and due to limitations of bwa, cannot be parallelized. I wouldn't be surprised if it is still running after 20 hours.
If you use a different NGLess version that uses a different bwa version, the resource will be re-indexed. The good part is that this only has to be done once per version.

@chodarq
Copy link
Author

chodarq commented Dec 9, 2020

Well, until now, no problem in the second machine.
I will keep update the info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants