Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNP_calling rule crashes for samples with more than 131072 contigs per sample due to a limitation in LoFreq #102

Open
DennisSchmitz opened this issue Oct 25, 2019 · 1 comment
Assignees
Labels
bug Something isn't working wontfix This will not be worked on

Comments

@DennisSchmitz
Copy link
Owner

This issue was emailed to me by @RozemarijnVanDerPlaats.

One specific sample kept crashing in the SNP_calling step, see DRMAA log below:

Error in rule SNP_calling:
    jobid: 0
    output: data/scaffolds_filtered/4_S4_scaffolds_ge500nt.fasta.fai, data/scaffolds_filtered/4_S4_unfiltered.vcf, data/scaffolds_filtered/4_S4_filtered.vcf, data/scaffolds_filtered/4_S4_filtered.vcf.gz, data/scaffolds_filtered/4_S4_filtered.vcf.gz.tbi
    log: logs/SNP_calling_4_S4.log
    conda-env: /mnt/scratch_dir/plaatvdr/Jovian/.snakemake/conda/e0281965

RuleException:
CalledProcessError in line 366 of /mnt/scratch_dir/plaatvdr/Jovian/Snakefile:
Command 'source /mnt/miniconda/bin/activate '/mnt/scratch_dir/plaatvdr/Jovian/.snakemake/conda/e0281965'; set -euo pipefail;  samtools faidx -o data/scaffolds_filtered/4_S4_scaffolds_ge500nt.fasta.fai data/scaffolds_filtered/4_S4_scaffolds_ge500nt.fasta > logs/SNP_calling_4_S4.log 2>&1
lofreq call-parallel -d 20000 --no-default-filter --pp-threads 12 -f data/scaffolds_filtered/4_S4_scaffolds_ge500nt.fasta -o data/scaffolds_filtered/4_S4_unfiltered.vcf data/scaffolds_filtered/4_S4_sorted.bam >> logs/SNP_calling_4_S4.log 2>&1
lofreq filter -a 0.05 -i data/scaffolds_filtered/4_S4_unfiltered.vcf -o data/scaffolds_filtered/4_S4_filtered.vcf >> logs/SNP_calling_4_S4.log 2>&1
bgzip -c data/scaffolds_filtered/4_S4_filtered.vcf 2>> logs/SNP_calling_4_S4.log 1> data/scaffolds_filtered/4_S4_filtered.vcf.gz
tabix -p vcf data/scaffolds_filtered/4_S4_filtered.vcf.gz >> logs/SNP_calling_4_S4.log 2>&1' returned non-zero exit status 1.
  File "/mnt/scratch_dir/plaatvdr/Jovian/Snakefile", line 366, in __rule_SNP_calling
  File "/home/plaatvdr/envs/Jovian_master/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Removing output files of failed job SNP_calling since they might be corrupted:
data/scaffolds_filtered/4_S4_scaffolds_ge500nt.fasta.fai
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

See the log file below:

INFO [2019-10-25 14:46:08,446]: Using 12 threads with following basic args: lofreq call -d 20000 --no-default-filter -f data/scaffolds_filtered/4_S4_scaffolds_ge500nt.fasta data/scaffolds_filtered/4_S4_sorted.bam

INFO [2019-10-25 14:46:10,903]: Adding 157086 commands to mp-pool
Traceback (most recent call last):
  File "/mnt/scratch_dir/plaatvdr/Jovian/.snakemake/conda/e0281965/bin/lofreq2_call_pparallel.py", line 746, in <module>
    main()
  File "/mnt/scratch_dir/plaatvdr/Jovian/.snakemake/conda/e0281965/bin/lofreq2_call_pparallel.py", line 669, in main
    "##source=%s" % ' '.join(sys.argv))
  File "/mnt/scratch_dir/plaatvdr/Jovian/.snakemake/conda/e0281965/bin/lofreq2_call_pparallel.py", line 174, in concat_vcf_files
    subprocess.check_call(cmd)
  File "/mnt/scratch_dir/plaatvdr/Jovian/.snakemake/conda/e0281965/lib/python3.6/subprocess.py", line 286, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/mnt/scratch_dir/plaatvdr/Jovian/.snakemake/conda/e0281965/lib/python3.6/subprocess.py", line 267, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/mnt/scratch_dir/plaatvdr/Jovian/.snakemake/conda/e0281965/lib/python3.6/subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "/mnt/scratch_dir/plaatvdr/Jovian/.snakemake/conda/e0281965/lib/python3.6/subprocess.py", line 1344, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 7] Argument list too long: 'lofreq'

Searching for this error on LoFreq's issues paged turned up the following issue CSB5/lofreq#79. Apparently, LoFreq has a hardcoded limit of only accepting 137072 contigs per sample. When I checked the number of trimmed scaffolds in this sample, it was 157086 contigs. So that is the cause of the problem.

The solution would be to write a checker that splits up files with more than 137072 contigs and later merging them back again. But it seems like such a corner-case that I'm giving it a low-priority.

A "work-around" would be to remove such samples from your analysis, at least then the entire Jovian analysis will finish. Another "work-around" would be to tweak the filtering parameters such that the number of contigs drops below the LoFreq limit, e.g. by increasing the minlen parameter (and thus filtering away more scaffolds).

Please, if you also encounter this error, mention it in this thread so I can reevaluate the priority.

@DennisSchmitz DennisSchmitz added bug Something isn't working wontfix This will not be worked on labels Oct 25, 2019
@DennisSchmitz DennisSchmitz self-assigned this Oct 25, 2019
@DennisSchmitz
Copy link
Owner Author

Other samples in @RozemarijnVanDerPlaats's run have the same problem. It seem to happen in environmental samples (e.g. surface water) where it makes sense that there are a great many organisms that are so diluted as to not generate enough reads to assemble into bigger scaffolds.

This has never been a problem in the hundreds of clinical samples processed thus-far, nor do I expect it to be in the future. Still, it's sloppy and hinders broader usage.

I've asked for the data so I can test a solution when I've got the time for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

1 participant