Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alignment pipeline error #39

Closed
camilogarciabotero opened this issue Oct 12, 2020 · 8 comments
Closed

Alignment pipeline error #39

camilogarciabotero opened this issue Oct 12, 2020 · 8 comments

Comments

@camilogarciabotero
Copy link

HI @eldariont

Thanks for working on this tool. I was trying to make a call using the svim reads method as follow:

svim reads --min_mapq 30 <working-dir> <fastq.gz> <genome.fasta>

It just suddenly stopped, so I decided to share the entire .log file here:

2020-10-11 20:25:47,438 [INFO   ]  ****************** Start SVIM, version 1.4.2 ******************
2020-10-11 20:25:47,438 [INFO   ]  CMD: python3 /home/cgarci39/.conda/envs/svim/bin/svim reads --min_mapq 30 /home/cgarci39/Projects/Bacillus_subtilis/Experiments/Svim_jobs/01_15-M1_nmlr-svim /home/cgarci39/Projects/Bacillus_subtilis/Data/Raw_fastq/M1_Fastqs/M1.fastq.gz /home/cgarci39/Projects/Bacillus_subtilis/Data/Local_genomes/EA_CB0015_hybrid.fasta
2020-10-11 20:25:47,438 [INFO   ]  WORKING DIR: /home/cgarci39/Projects/Bacillus_subtilis/Experiments/Svim_jobs/01_15-M1_nmlr-svim
2020-10-11 20:25:47,438 [INFO   ]  PARAMETER: sub, VALUE: reads
2020-10-11 20:25:47,438 [INFO   ]  PARAMETER: working_dir, VALUE: /home/cgarci39/Projects/Bacillus_subtilis/Experiments/Svim_jobs/01_15-M1_nmlr-svim
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: reads, VALUE: /home/cgarci39/Projects/Bacillus_subtilis/Data/Raw_fastq/M1_Fastqs/M1.fastq.gz
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: genome, VALUE: /home/cgarci39/Projects/Bacillus_subtilis/Data/Local_genomes/EA_CB0015_hybrid.fasta
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: verbose, VALUE: False
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: cores, VALUE: 1
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: aligner, VALUE: ngmlr
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: nanopore, VALUE: False
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: min_mapq, VALUE: 30
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: min_sv_size, VALUE: 40
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: max_sv_size, VALUE: 100000
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: segment_gap_tolerance, VALUE: 10
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: segment_overlap_tolerance, VALUE: 5
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: all_bnds, VALUE: False
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: partition_max_distance, VALUE: 1000
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: distance_normalizer, VALUE: 900
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: cluster_max_distance, VALUE: 0.3
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: del_ins_dup_max_distance, VALUE: 1.0
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: trans_sv_max_distance, VALUE: 500
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: skip_genotyping, VALUE: False
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: minimum_score, VALUE: 3
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: homozygous_threshold, VALUE: 0.8
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: heterozygous_threshold, VALUE: 0.2
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: minimum_depth, VALUE: 4
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: sample, VALUE: Sample
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: types, VALUE: DEL,INS,INV,DUP:TANDEM,DUP:INT,BND
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: sequence_alleles, VALUE: False
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: insertion_sequences, VALUE: False
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: tandem_duplications_as_insertions, VALUE: False
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: interspersed_duplications_as_insertions, VALUE: False
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: read_names, VALUE: False
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: zmws, VALUE: False
2020-10-11 20:25:47,439 [INFO   ]  ****************** STEP 1: COLLECT ******************
2020-10-11 20:25:47,439 [INFO   ]  MODE: reads
2020-10-11 20:25:47,439 [INFO   ]  INPUT: /home/cgarci39/Projects/Bacillus_subtilis/Data/Raw_fastq/M1_Fastqs/M1.fastq.gz
2020-10-11 20:25:47,439 [INFO   ]  GENOME: /home/cgarci39/Projects/Bacillus_subtilis/Data/Local_genomes/EA_CB0015_hybrid.fasta
2020-10-11 20:25:47,439 [INFO   ]  Recognized reads file as gzipped FASTQ format.
2020-10-11 20:25:47,671 [INFO   ]  Starting alignment pipeline..
2020-10-11 20:27:58,503 [ERROR  ]  The alignment pipeline failed with exit code 1. Command was: set -o pipefail && gunzip -c /home/cgarci39/Projects/Bacillus_subtilis/Data/Raw_fastq/M1_Fastqs/M1.fastq.gz | ngmlr -t 1 -r /home/cgarci39/Projects/Bacillus_subtilis/Data/Local_genomes/EA_CB0015_hybrid.fasta | samtools view -b -@ 1 | samtools sort -@ 1 -o /home/cgarci39/Projects/Bacillus_subtilis/Experiments/Svim_jobs/01_15-M1_nmlr-svim/M1.fastq.ngmlr.coordsorted.bam
Traceback (most recent call last):
  File "/home/cgarci39/.conda/envs/svim/lib/python3.7/site-packages/svim/SVIM_alignment.py", line 52, in run_alignment
    run(" ".join(command_align), shell=True, check=True, executable='/bin/bash')
  File "/home/cgarci39/.conda/envs/svim/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command 'set -o pipefail && gunzip -c /home/cgarci39/Projects/Bacillus_subtilis/Data/Raw_fastq/M1_Fastqs/M1.fastq.gz | ngmlr -t 1 -r /home/cgarci39/Projects/Bacillus_subtilis/Data/Local_genomes/EA_CB0015_hybrid.fasta | samtools view -b -@ 1 | samtools sort -@ 1 -o /home/cgarci39/Projects/Bacillus_subtilis/Experiments/Svim_jobs/01_15-M1_nmlr-svim/M1.fastq.ngmlr.coordsorted.bam' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/cgarci39/.conda/envs/svim/bin/svim", line 216, in <module>
    sys.exit(main())
  File "/home/cgarci39/.conda/envs/svim/bin/svim", line 85, in main
    bam_path = run_alignment(options.working_dir, options.genome, options.reads, reads_type, options.cores, options.aligner, options.nanopore)
  File "/home/cgarci39/.conda/envs/svim/lib/python3.7/site-packages/svim/SVIM_alignment.py", line 55, in run_alignment
    raise AlignmentPipelineError('The alignment pipeline failed with exit code {0}. Command was: {1}'.format(e.returncode, e.cmd)) from e
svim.SVIM_alignment.AlignmentPipelineError: The alignment pipeline failed with exit code 1. Command was: set -o pipefail && gunzip -c /home/cgarci39/Projects/Bacillus_subtilis/Data/Raw_fastq/M1_Fastqs/M1.fastq.gz | ngmlr -t 1 -r /home/cgarci39/Projects/Bacillus_subtilis/Data/Local_genomes/EA_CB0015_hybrid.fasta | samtools view -b -@ 1 | samtools sort -@ 1 -o /home/cgarci39/Projects/Bacillus_subtilis/Experiments/Svim_jobs/01_15-M1_nmlr-svim/M1.fastq.ngmlr.coordsorted.bam

After the job is executed it gives me also a coordsorted.bam file which I suspect is good and complete, but the pipeline does not continue.

By the way, I installed svim via conda. If there is a way to solve this issue I will really appreciate it!

Also here is my conda list for the env:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
bzip2                     1.0.8                h516909a_3    conda-forge
c-ares                    1.16.1               h516909a_3    conda-forge
ca-certificates           2020.6.20            hecda079_0    conda-forge
certifi                   2020.6.20        py37he5f6b98_2    conda-forge
cycler                    0.10.0                     py_2    conda-forge
decorator                 4.4.2                      py_0    conda-forge
freetype                  2.10.3               he06d7ca_0    conda-forge
htslib                    1.11                 hd3b49d5_0    bioconda
jpeg                      9d                   h516909a_0    conda-forge
k8                        0.2.5                he513fc3_0    bioconda
kiwisolver                1.2.0            py37h99015e2_0    conda-forge
krb5                      1.17.1               hfafb76e_3    conda-forge
lcms2                     2.11                 hbd6801e_0    conda-forge
ld_impl_linux-64          2.35                 h769bd43_9    conda-forge
libblas                   3.8.0               17_openblas    conda-forge
libcblas                  3.8.0               17_openblas    conda-forge
libcurl                   7.71.1               hcdd3856_8    conda-forge
libdeflate                1.6                  h516909a_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libffi                    3.2.1             he1b5a44_1007    conda-forge
libgcc-ng                 9.3.0               h5dbcf3e_17    conda-forge
libgfortran-ng            7.5.0               hae1eefd_17    conda-forge
libgfortran4              7.5.0               hae1eefd_17    conda-forge
libgomp                   9.3.0               h5dbcf3e_17    conda-forge
liblapack                 3.8.0               17_openblas    conda-forge
libnghttp2                1.41.0               h8cfc5f6_2    conda-forge
libopenblas               0.3.10          pthreads_hb3c22a3_5    conda-forge
libpng                    1.6.37               hed695b0_2    conda-forge
libssh2                   1.9.0                hab1572f_5    conda-forge
libstdcxx-ng              9.3.0               h2ae2ef3_17    conda-forge
libtiff                   4.1.0                hc7e4089_6    conda-forge
libwebp-base              1.1.0                h516909a_3    conda-forge
lz4-c                     1.9.2                he1b5a44_3    conda-forge
matplotlib-base           3.3.2            py37hd478181_0    conda-forge
minimap2                  2.17                 hed695b0_3    bioconda
ncurses                   6.2                  he1b5a44_1    conda-forge
networkx                  2.5                        py_0    conda-forge
ngmlr                     0.2.7                he513fc3_2    bioconda
numpy                     1.19.2           py37h7ea13bd_1    conda-forge
olefile                   0.46                       py_0    conda-forge
openssl                   1.1.1h               h516909a_0    conda-forge
pillow                    7.2.0            py37h718be6c_1    conda-forge
pip                       20.2.3                     py_0    conda-forge
pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
pysam                     0.16.0.1         py37hc334e0b_1    bioconda
python                    3.7.8           h6f2ec95_1_cpython    conda-forge
python-dateutil           2.8.1                      py_0    conda-forge
python_abi                3.7                     1_cp37m    conda-forge
readline                  8.0                  he28a2e2_2    conda-forge
samtools                  1.11                 h6270b1f_0    bioconda
scipy                     1.5.2            py37hb14ef9d_1    conda-forge
setuptools                49.6.0           py37he5f6b98_2    conda-forge
six                       1.15.0             pyh9f0ad1d_0    conda-forge
sqlite                    3.33.0               h4cf870e_1    conda-forge
svim                      1.4.2                      py_0    bioconda
tk                        8.6.10               hed695b0_1    conda-forge
tornado                   6.0.4            py37h8f50634_1    conda-forge
wheel                     0.35.1             pyh9f0ad1d_0    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
zlib                      1.2.11            h516909a_1009    conda-forge
zstd                      1.4.5                h6597ccf_2    conda-forge

Thanks in advance.
Camilo

@eldariont
Copy link
Owner

Hi Camilo,

thanks for reporting this issue and sorry for my late reply. I was out of office for two weeks.

The error message that you get indicates that SVIM tried to execute the following command but failed:

set -o pipefail && gunzip -c /home/cgarci39/Projects/Bacillus_subtilis/Data/Raw_fastq/M1_Fastqs/M1.fastq.gz | ngmlr -t 1 -r /home/cgarci39/Projects/Bacillus_subtilis/Data/Local_genomes/EA_CB0015_hybrid.fasta | samtools view -b -@ 1 | samtools sort -@ 1 -o /home/cgarci39/Projects/Bacillus_subtilis/Experiments/Svim_jobs/01_15-M1_nmlr-svim/M1.fastq.ngmlr.coordsorted.bam

To locate the error you can execute the command yourself on your command line and check whether you see any error messages in the output. I would expect that either ngmlr or samtools experience some kind of problem that causes the pipe to fail.

If this does not give you any hints, you can also send me the files and I can have a look.

Cheers
David

@camilogarciabotero
Copy link
Author

Thanks David,

Look I ran the line you suggested and it seems to be something related to samtools view:

2020-10-11 20:25:47,438 [INFO   ]  ****************** Start SVIM, version 1.4.2 ******************
2020-10-11 20:25:47,438 [INFO   ]  CMD: python3 /home/cgarci39/.conda/envs/svim/bin/svim reads --min_mapq 30 /home/cgarci39/Projects/Bacillus_subtilis/Experiments/Svim_jobs/01_15-M1_nmlr-svim /home/cgarci39/Projects/Bacillus_subtilis/Data/Raw_fastq/M1_Fastqs/M1.fastq.gz /home/cgarci39/Projects/Bacillus_subtilis/Data/Local_genomes/EA_CB0015_hybrid.fasta
2020-10-11 20:25:47,438 [INFO   ]  WORKING DIR: /home/cgarci39/Projects/Bacillus_subtilis/Experiments/Svim_jobs/01_15-M1_nmlr-svim
2020-10-11 20:25:47,438 [INFO   ]  PARAMETER: sub, VALUE: reads
2020-10-11 20:25:47,438 [INFO   ]  PARAMETER: working_dir, VALUE: /home/cgarci39/Projects/Bacillus_subtilis/Experiments/Svim_jobs/01_15-M1_nmlr-svim
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: reads, VALUE: /home/cgarci39/Projects/Bacillus_subtilis/Data/Raw_fastq/M1_Fastqs/M1.fastq.gz
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: genome, VALUE: /home/cgarci39/Projects/Bacillus_subtilis/Data/Local_genomes/EA_CB0015_hybrid.fasta
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: verbose, VALUE: False
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: cores, VALUE: 1
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: aligner, VALUE: ngmlr
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: nanopore, VALUE: False
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: min_mapq, VALUE: 30
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: min_sv_size, VALUE: 40
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: max_sv_size, VALUE: 100000
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: segment_gap_tolerance, VALUE: 10
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: segment_overlap_tolerance, VALUE: 5
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: all_bnds, VALUE: False
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: partition_max_distance, VALUE: 1000
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: distance_normalizer, VALUE: 900
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: cluster_max_distance, VALUE: 0.3
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: del_ins_dup_max_distance, VALUE: 1.0
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: trans_sv_max_distance, VALUE: 500
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: skip_genotyping, VALUE: False
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: minimum_score, VALUE: 3
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: homozygous_threshold, VALUE: 0.8
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: heterozygous_threshold, VALUE: 0.2
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: minimum_depth, VALUE: 4
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: sample, VALUE: Sample
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: types, VALUE: DEL,INS,INV,DUP:TANDEM,DUP:INT,BND
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: sequence_alleles, VALUE: False
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: insertion_sequences, VALUE: False
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: tandem_duplications_as_insertions, VALUE: False
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: interspersed_duplications_as_insertions, VALUE: False
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: read_names, VALUE: False
2020-10-11 20:25:47,439 [INFO   ]  PARAMETER: zmws, VALUE: False
2020-10-11 20:25:47,439 [INFO   ]  ****************** STEP 1: COLLECT ******************
2020-10-11 20:25:47,439 [INFO   ]  MODE: reads
2020-10-11 20:25:47,439 [INFO   ]  INPUT: /home/cgarci39/Projects/Bacillus_subtilis/Data/Raw_fastq/M1_Fastqs/M1.fastq.gz
2020-10-11 20:25:47,439 [INFO   ]  GENOME: /home/cgarci39/Projects/Bacillus_subtilis/Data/Local_genomes/EA_CB0015_hybrid.fasta
2020-10-11 20:25:47,439 [INFO   ]  Recognized reads file as gzipped FASTQ format.
2020-10-11 20:25:47,671 [INFO   ]  Starting alignment pipeline..
2020-10-11 20:27:58,503 [ERROR  ]  The alignment pipeline failed with exit code 1. Command was: set -o pipefail && gunzip -c /home/cgarci39/Projects/Bacillus_subtilis/Data/Raw_fastq/M1_Fastqs/M1.fastq.gz | ngmlr -t 1 -r /home/cgarci39/Projects/Bacillus_subtilis/Data/Local_genomes/EA_CB0015_hybrid.fasta | samtools view -b -@ 1 | samtools sort -@ 1 -o /home/cgarci39/Projects/Bacillus_subtilis/Experiments/Svim_jobs/01_15-M1_nmlr-svim/M1.fastq.ngmlr.coordsorted.bam
Traceback (most recent call last):
  File "/home/cgarci39/.conda/envs/svim/lib/python3.7/site-packages/svim/SVIM_alignment.py", line 52, in run_alignment
    run(" ".join(command_align), shell=True, check=True, executable='/bin/bash')
  File "/home/cgarci39/.conda/envs/svim/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command 'set -o pipefail && gunzip -c /home/cgarci39/Projects/Bacillus_subtilis/Data/Raw_fastq/M1_Fastqs/M1.fastq.gz | ngmlr -t 1 -r /home/cgarci39/Projects/Bacillus_subtilis/Data/Local_genomes/EA_CB0015_hybrid.fasta | samtools view -b -@ 1 | samtools sort -@ 1 -o /home/cgarci39/Projects/Bacillus_subtilis/Experiments/Svim_jobs/01_15-M1_nmlr-svim/M1.fastq.ngmlr.coordsorted.bam' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/cgarci39/.conda/envs/svim/bin/svim", line 216, in <module>
    sys.exit(main())
  File "/home/cgarci39/.conda/envs/svim/bin/svim", line 85, in main
    bam_path = run_alignment(options.working_dir, options.genome, options.reads, reads_type, options.cores, options.aligner, options.nanopore)
  File "/home/cgarci39/.conda/envs/svim/lib/python3.7/site-packages/svim/SVIM_alignment.py", line 55, in run_alignment
    raise AlignmentPipelineError('The alignment pipeline failed with exit code {0}. Command was: {1}'.format(e.returncode, e.cmd)) from e
svim.SVIM_alignment.AlignmentPipelineError: The alignment pipeline failed with exit code 1. Command was: set -o pipefail && gunzip -c /home/cgarci39/Projects/Bacillus_subtilis/Data/Raw_fastq/M1_Fastqs/M1.fastq.gz | ngmlr -t 1 -r /home/cgarci39/Projects/Bacillus_subtilis/Data/Local_genomes/EA_CB0015_hybrid.fasta | samtools view -b -@ 1 | samtools sort -@ 1 -o /home/cgarci39/Projects/Bacillus_subtilis/Experiments/Svim_jobs/01_15-M1_nmlr-svim/M1.fastq.ngmlr.coordsorted.bam

Also, the standard error gives mi this message:

ngmlr 0.2.7 (build: Jul  3 2020 03:31:03, start: 2020-10-22.09:13:35)
Contact: philipp.rescheneder@univie.ac.at
Writing output (SAM) to stdout
Reading encoded reference from /home/cgarci39/Projects/Bacillus_subtilis/Data/Local_genomes/EA_CB0015_hybrid.fasta-enc.2.ngm
Reading 4 Mbp from disk took 0.00s
Reading reference index from /home/cgarci39/Projects/Bacillus_subtilis/Data/Local_genomes/EA_CB0015_hybrid.fasta-ht-13-2.2.ngm
Reading from disk took 0.24s
Opening query file /dev/stdin
Mapping reads...
Waiting for data from stdin
Processed: 1229 (0.74), R/S: 9.31, RL: 10580, Time: 1.00 1.00 94.00, Align: 1.00, 477, 0.94
[W::sam_read1] Parse error at line 1123
samtools view: error reading file "-"

Cheers,
Camilo.

@eldariont
Copy link
Owner

Hi Camilo,

this looks indeed like samtools cannot parse the alignment of read 1230 or so. You could try to split the command up into the following parts:

  1. Align reads
gunzip -c /home/cgarci39/Projects/Bacillus_subtilis/Data/Raw_fastq/M1_Fastqs/M1.fastq.gz | ngmlr -t 1 -r /home/cgarci39/Projects/Bacillus_subtilis/Data/Local_genomes/EA_CB0015_hybrid.fasta > /home/cgarci39/Projects/Bacillus_subtilis/Experiments/Svim_jobs/01_15-M1_nmlr-svim/M1.EA_CB0015_hybrid.sam
  1. Sort alignments
samtools view -b -@ 1 /home/cgarci39/Projects/Bacillus_subtilis/Experiments/Svim_jobs/01_15-M1_nmlr-svim/M1.EA_CB0015_hybrid.sam | samtools sort -@ 1 -o /home/cgarci39/Projects/Bacillus_subtilis/Experiments/Svim_jobs/01_15-M1_nmlr-svim/M1.fastq.ngmlr.coordsorted.bam

If step 1 succeeds but step 2 fails it means that there is something wrong with the SAM file generated by NGMLR. If step1 fails already it means that NGMLR has a problem with your input data. In any case, this might give you interesting insights into the problem.

Cheers
David

@eldariont
Copy link
Owner

Hi Camilo,

did you have any luck tracking down where the error comes from? Would be good to know whether it's something I need to fix in SVIM or rather a problem with ngmlr/samtools.

Best
David

@camilogarciabotero
Copy link
Author

Hi David,

I'm sorry for the late response on this issue, I didn't see your last reply past week. I run the first step and it worked really fine, it generated the .bam file. However, when running the second step it failed. The error was the following:

[W::sam_read1] Parse error at line 102
samtools view: error reading file "/home/cgarci39/Projects/Bacillus_subtilis/Experiments/Svim_jobs/01_15-M1_nmlr-svim/M1.EA_CB0015_hybrid.sam"

It is not very informative but I'm not getting anything else.

Best,
Camilo.

@eldariont
Copy link
Owner

Hi Camilo,

thanks for running the commands. The error message indicates that there is something weird in line 102 of the SAM output from NGMLR. In my experience, NGMLR is not as stable as minimap2 and several similar issues have been reported here and here and here. In those issues, a negative mapping quality value seemed to cause problems for samtools versions higher than 1.09 (you use 1.11).

I think you have the following options:

  1. You can have a close look at line 102 of /home/cgarci39/Projects/Bacillus_subtilis/Experiments/Svim_jobs/01_15-M1_nmlr-svim/M1.EA_CB0015_hybrid.sam and check whether you see anything weird, like a negative mapping quality.
  2. You can use an earlier version of samtools for sorting but this might only hide the problem.
  3. The third and my recommended option would be to use minimap2 instead of ngmlr for alignment:
    svim reads --min_mapq 30 --aligner minimap2 <working-dir> <fastq.gz> <genome.fasta>

Cheers
David

@camilogarciabotero
Copy link
Author

Hey David,

Thank you very much for your entire support. Solution number three worked pretty fine for me.

Best,
Camilo.

@KristinaGagalova
Copy link

Hi
I want to report the same error in ngmlr and share a simple solution that has been described.
I am using the most current version of svim from conda

This is the error
gunzip -c reads.gz | ngmlr -t 1 -r my_ref.fasta | samtools view -b -@ 1 | samtools sort -@ 1 -o my_fasta.ngmlr.coordsorted.bam [W::sam_read1] Parse error at line 5174 [main_samview] truncated file.
The developers of ngmlr point to samtools is are incompatible with the ngmlr output.
This is how I solved my issue - adding reads filtering from the sam alignment as shown here - philres/ngmlr#89

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants