Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fragment_length_analysis step crashes on low-quality/low-abundance samples #117

Open
DennisSchmitz opened this issue Feb 6, 2020 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@DennisSchmitz
Copy link
Owner

See Nextseq run 24, 2019, sample 18 for a test-case.

Bug submitted by @jeroencremer. The fragment length analysis step crashes on low-quality/low-abundance samples in default mode. The reason; due to stringency settings, the scaffold file is empty (due to filtering), then the subsequent step (which is fragment length analysis) crashes. This is a situation I didn't account for in the code. Good news is, if a sample crashes for this reason it's probably bad/useless anyway and can safely be removed from the analysis and restarted.

I do have to update the code to catch and handle this situation, either by touching empty files or by removing the sample in it's entirety from the analysis.

cat logs/Fragment_length_analysis_RUN24-18_S175.log 
[bwa_index] Pack FASTA... 0.00 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 0.00 seconds elapse.
[bwa_index] Update BWT... 0.00 sec
[bwa_index] Pack forward-only FASTA... 0.00 sec
[bwa_index] Construct SA from BWT and Occ... 0.00 sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa index data/scaffolds_filtered/RUN24-18_S175_scaffolds_ge500nt.fasta
[main] Real time: 0.074 sec; CPU: 0.002 sec
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 35584 sequences (3717313 bp)...
cat logs/drmaa/179773.out 
Sender: LSF System <XXX>
Subject: Job 179773: <Jovian_Fragment_length_analysis.jobid402> in cluster <XXX> Exited

Job <Jovian_Fragment_length_analysis.jobid402> was submitted from host <XXX> by user <XXX> in cluster <XXX> at Thu Jan 30 11:53:42 2020.
Job was executed on host(s) <XXX>, in queue <XXX>, as user <XXX> in cluster <XXX> at Thu Jan 30 11:53:43 2020.
<XXX> was used as the home directory.
<XXX> was used as the working directory.
Started at Thu Jan 30 11:53:43 2020.
Terminated at Thu Jan 30 11:54:17 2020.
Results reported at Thu Jan 30 11:54:17 2020.

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
XXX/Nextseq_RUN24/.snakemake/tmp.i83d3146/Jovian_Fragment_length_analysis.jobid402
------------------------------------------------------------

Exited with exit code 1.

Resource usage summary:

    CPU time :                                   5.26 sec.
    Max Memory :                                 34 MB
    Average Memory :                             24.20 MB
    Total Requested Memory :                     -
    Delta Memory :                               -
    Max Swap :                                   -
    Max Processes :                              7
    Max Threads :                                10
    Run time :                                   38 sec.
    Turnaround time :                            35 sec.

The output (if any) follows:

Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 4
Rules claiming more threads will be scaled down.
Job counts:
	count	jobs
	1	Fragment_length_analysis
	1

[Thu Jan 30 11:54:04 2020]
rule Fragment_length_analysis:
    input: data/scaffolds_filtered/RUN24-18_S175_scaffolds_ge500nt.fasta, data/cleaned_fastq/RUN24-18_S175_pR1.fq, data/cleaned_fastq/RUN24-18_S175_pR2.fq
    output: data/scaffolds_filtered/RUN24-18_S175_sorted.bam, data/scaffolds_filtered/RUN24-18_S175_sorted.bam.bai, data/scaffolds_filtered/RUN24-18_S175_insert_size_metrics.txt, data/scaffolds_filtered/RUN24-18_S175_insert_size_histogram.pdf
    log: logs/Fragment_length_analysis_RUN24-18_S175.log
    jobid: 0
    benchmark: logs/benchmark/Fragment_length_analysis_RUN24-18_S175.txt
    wildcards: sample=RUN24-18_S175
    threads: 4

Activating conda environment: XXX/Nextseq_RUN24/.snakemake/conda/d953bd61
/bin/bash: line 1: 11697 Segmentation fault      (core dumped) bwa mem -t 4 data/scaffolds_filtered/RUN24-18_S175_scaffolds_ge500nt.fasta data/cleaned_fastq/RUN24-18_S175_pR1.fq data/cleaned_fastq/RUN24-18_S175_pR2.fq 2>> logs/Fragment_length_analysis_RUN24-18_S175.log
     11698 Done                    | samtools view -@ 4 -uS - 2>> logs/Fragment_length_analysis_RUN24-18_S175.log
     11699 Done                    | samtools sort -@ 4 - -o data/scaffolds_filtered/RUN24-18_S175_sorted.bam >> logs/Fragment_length_analysis_RUN24-18_S175.log 2>&1
[Thu Jan 30 11:54:17 2020]
Error in rule Fragment_length_analysis:
    jobid: 0
    output: data/scaffolds_filtered/RUN24-18_S175_sorted.bam, data/scaffolds_filtered/RUN24-18_S175_sorted.bam.bai, data/scaffolds_filtered/RUN24-18_S175_insert_size_metrics.txt, data/scaffolds_filtered/RUN24-18_S175_insert_size_histogram.pdf
    log: logs/Fragment_length_analysis_RUN24-18_S175.log
    conda-env: XXX/Nextseq_RUN24/.snakemake/conda/d953bd61

RuleException:
CalledProcessError in line 336 of XXX/Nextseq_RUN24/Snakefile:
Command 'source /mnt/miniconda/bin/activate 'XXX/Nextseq_RUN24/.snakemake/conda/d953bd61'; set -euo pipefail;  bwa index data/scaffolds_filtered/RUN24-18_S175_scaffolds_ge500nt.fasta > logs/Fragment_length_analysis_RUN24-18_S175.log 2>&1
bwa mem -t 4 data/scaffolds_filtered/RUN24-18_S175_scaffolds_ge500nt.fasta data/cleaned_fastq/RUN24-18_S175_pR1.fq data/cleaned_fastq/RUN24-18_S175_pR2.fq 2>> logs/Fragment_length_analysis_RUN24-18_S175.log |samtools view -@ 4 -uS - 2>> logs/Fragment_length_analysis_RUN24-18_S175.log |samtools sort -@ 4 - -o data/scaffolds_filtered/RUN24-18_S175_sorted.bam >> logs/Fragment_length_analysis_RUN24-18_S175.log 2>&1
samtools index -@ 4 data/scaffolds_filtered/RUN24-18_S175_sorted.bam >> logs/Fragment_length_analysis_RUN24-18_S175.log 2>&1
picard -Dpicard.useLegacyParser=false CollectInsertSizeMetrics -I data/scaffolds_filtered/RUN24-18_S175_sorted.bam -O data/scaffolds_filtered/RUN24-18_S175_insert_size_metrics.txt -H data/scaffolds_filtered/RUN24-18_S175_insert_size_histogram.pdf >> logs/Fragment_length_analysis_RUN24-18_S175.log 2>&1' returned non-zero exit status 139.
  File "/XXX/Nextseq_RUN24/Snakefile", line 336, in __rule_Fragment_length_analysis
  File "XXX/.conda/envs/Jovian_master/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Removing output files of failed job Fragment_length_analysis since they might be corrupted:
data/scaffolds_filtered/RUN24-18_S175_sorted.bam
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message


PS:

Unable to read stderr data from stderr buffer file; your job was probably aborted prematurely.

Only 10MB reads left after filtering:

ls -lah ./data/cleaned_fastq/RUN24-18_S175_*
4.4M Jan 28 16:37 ./data/cleaned_fastq/RUN24-18_S175_pR1.fq
4.4M Jan 28 16:37 ./data/cleaned_fastq/RUN24-18_S175_pR2.fq
2.2M Jan 28 16:36 ./data/cleaned_fastq/RUN24-18_S175_unpaired.fq

The filtered scaffold file is empty:

ls -lah ./data/scaffolds_filtered/RUN24-18_S175_scaffolds_ge500nt.fasta
0 Jan 28 17:32 ./data/scaffolds_filtered/RUN24-18_S175_scaffolds_ge500nt.fasta
wc -l ./data/scaffolds_filtered/RUN24-18_S175_scaffolds_ge500nt.fasta
0 ./data/scaffolds_filtered/RUN24-18_S175_scaffolds_ge500nt.fasta
@DennisSchmitz DennisSchmitz self-assigned this Feb 6, 2020
@DennisSchmitz DennisSchmitz added the bug Something isn't working label Feb 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant