Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

samtools index: failed to create index #1

Open
michaeldonaldson opened this issue Apr 14, 2020 · 8 comments
Open

samtools index: failed to create index #1

michaeldonaldson opened this issue Apr 14, 2020 · 8 comments

Comments

@michaeldonaldson
Copy link

Hello,

I have followed the BaitSTR workflow to create 'contigs.str.fa' for one of my datasets. I am now trying to use BaitSTR_type and it looks to be getting hung up on creating an index. I think that it is failing to sort the bam file and then the index fails.

Here is the command I used:

perl ../baitSTR_type/BaitSTR_type.pl --index --index_prefix contig.str --stem run1 --mem --full --target ./contigs.str.fa --path_to_lobSTR ~/workspace/SNP_caller/tools/lobSTR-bin-Linux-x86_64-4.0.6/bin --r1 /media/user/fastq/reads.r1.100.fastq,100 --r2 /media/user/fastq/reads.r2.100.fastq,100

Any thoughts would be appreciated! Here's the last of the log information:
.....
[bam_rmdup_core] processing reference Block204...
[bam_rmdup_core] 7 / 58501 = 0.0001 in library 'lib_100'
[bam_rmdupse_core] 13 / 5566 = 0.0023 in library 'lib_100'
[W::bam_merge_core2] No @hd tag found.
[bam_sort] Use -T PREFIX / -o FILE to specify temporary and final output files
Usage: samtools sort [options...] [in.bam]
Options:
-l INT Set compression level, from 0 (uncompressed) to 9 (best)
-m INT Set maximum memory per thread; suffix K/M/G recognized [768M]
-n Sort by read name
-t TAG Sort by value of TAG. Uses position as secondary index (or read name if -n is set)
-o FILE Write final output to FILE rather than standard output
-T PREFIX Write temporary files to PREFIX.nnnn.bam
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
-O, --output-fmt FORMAT[,OPT[=VAL]]...
Specify output format (SAM, BAM, CRAM)
--output-fmt-option OPT[=VAL]
Specify a single output file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
-@, --threads INT
Number of additional threads to use [0]
[bam_sort] Use -T PREFIX / -o FILE to specify temporary and final output files
Usage: samtools sort [options...] [in.bam]
Options:
-l INT Set compression level, from 0 (uncompressed) to 9 (best)
-m INT Set maximum memory per thread; suffix K/M/G recognized [768M]
-n Sort by read name
-t TAG Sort by value of TAG. Uses position as secondary index (or read name if -n is set)
-o FILE Write final output to FILE rather than standard output
-T PREFIX Write temporary files to PREFIX.nnnn.bam
--input-fmt-option OPT[=VAL]
Specify a single input file format option in the form
of OPTION or OPTION=VALUE
-O, --output-fmt FORMAT[,OPT[=VAL]]...
Specify output format (SAM, BAM, CRAM)
--output-fmt-option OPT[=VAL]
Specify a single output file format option in the form
of OPTION or OPTION=VALUE
--reference FILE
Reference sequence FASTA FILE [null]
-@, --threads INT
Number of additional threads to use [0]
[E::hts_idx_push] Chromosome blocks not continuous
samtools index: failed to create index for "run1.sample_100.aligned.bam"

CMD: /home/user/workspace/SNP_caller/tools/lobSTR-bin-Linux-x86_64-4.0.6/bin/allelotype --command classify --strinfo contig.str.lobSTRindex/strinfo.tab --out run1 --index-prefix contig.str.lobSTRindex/lobSTR_ --regions contig.str.lobSTRindex/lobSTR_mergedref.targets.bed --realign --filter-clipped --min-read-end-match 10 --filter-mapq0 --max-repeats-in-ends 3 --no-rmdup --noise_model run1.noisetmp --bam run1.sample_100.aligned.bam

[allelotype-4.0.6] 2020-04-14.12:34:25 ProgressMeter: Getting run info
[allelotype-4.0.6] 2020-04-14.12:34:25 ERROR: Could not open index files
[allelotype-4.0.6] 2020-04-14.12:34:25 ProgressMeter: Outputting run statistics

@lkistler
Copy link
Owner

lkistler commented Apr 14, 2020 via email

@michaeldonaldson
Copy link
Author

Thank you for looking into this, I am eager to see if your scripts can help our project!

I guess I should mention the following files get produced:
run1.sample_100.aligned.bam
run1.allelotype.stats
contig.str.probes.fasta

and a folder "contig.str.lobSTRindex" with BWA/lobSTR index files

here's a chunk of the contigs fasta:

Block2 TC:2:209:213
ACTAAAATGTAGCAACTGTGAGCCTTTTCCAGATGGAGCCACAAACAACCTCTCTAGATCTAATTCAGATGAGAGTTATTTCTCTGAAAAACGGAGAGTGTCCACTATGTACCATCCCGAAGGAGAATCCAGCACAGCCCCCTTTTTTTCTACTGATTCATCTCTGAATTTGCCTGTCCTAGAAGTAGGCAAAACTGAAAACCCTACATTCTCTTCAACTACACTTCCCAGACCTGGGGACCCTGGGGCTCCTCCTTTGCCCCCGGACTTGCAGCTAGACGAAGAAACTTGTGGA
Block3 GA:2:55:59
GGACAACTACTTGGCCTTCTTCAACTGGAGCAGCCTGACCCTCCTGCCCCGGCTGGAGAGCCTGGACCTGGCGGGGAACCAGCTGAAGGCCCTGACCAACGGCAGCCTCCCCGCGGACAGCAGGCTCCAGAGGCTG
Block4 GA:5:31:41
TATTGACTTCAGAGCAGAAGGGAGAGGGAGAGAGAGAGAGAAACATCAGTGCTGAGAGAGAATCACGGATCAGCTGCCTCCTGCACACCCTTTACTGGGGATGTGCCCGCAACCAAGG
Block5 TA:2:146:150
CCCGTGAGCTGCCGGTCTCCTCCACCTTCTGCTTGCAAATAGGCAAGTTCAGGATGCACCAAAAGTCCGGGATTATAGCCCCTAATGCATGCATCTGTCATTCGTGCTTCCAGTGTTTCAAATACTTTTTTCTTTGTCACATGAAGTATACCTAGGTTTGCAAAGCTGGATAAATCAAAAACAACCAAAGGTTAGGCAGTCAATGACTGGAATATATGGTTTCACTTGAGCACAGAGAATTAAAACACACACACACAGGTCCTCAATCACTGGGGCCCACACCATAGTTAAACACATTAGTATTTCTGCAGAAATA

.....

Block15748 AT:3:108:114
GGGCAAAAGTAGGTTTACAGTTGTAATACAAATAATACAATAATACATAATAATATAATAATAAGATGTGCTGCACATGCTCACAACTGTAAACCTACATTTACTCACATATATTTAAAGAAGTACAGAAATAATACACCAAATGATATTGATTATTAGTATTCCTAGGTTTTTTCATGTTTTTCTTTTTTAATTGCCTGAATCACTTATAATAAACACATTGTTTTTATAATTAAGATCGGAAGAGCACACGTCTGAACTCAGTCACAGCGATAGATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAAAAAAAAAAAAACAAAAAAAATAACAAAAGAAAATAAAAAATAACAGATAAAAAAAAACACAAC
Block15749 GC:2:65:69
GCCGGCTGTGTGAGGGGCAGTGGGCACACATGCAACTCGCTCACTCTCTCCCTGGAAAATCCAGGGCGCACCTCGGCTCCTGCAACATGCAGAATTAGAAATCGAACCGTTTATTGTCTCCTAACTTTTCTCT
Block15750 CCG:2:86:92
CAGCACCGCCGCTGACAAGTGGGGCTTCGGTGCCACCCTCCTGGAGATCTGCTTCGATGGGGAGGCCCCCCTGCAGGACCGCAGCCCCGCCGAGGTACATGTGGGTGACCCGTGGGCCTTCTCACAAAGGGGCCAGCACCTCCGAGGGGTCGAGCGGTCTGGGTCCGGAGCTGCCCCCTCTGGCCTGTGGCCTCACATATGTCGCCTCA

@lkistler
Copy link
Owner

lkistler commented Apr 14, 2020 via email

@michaeldonaldson
Copy link
Author

BWA_ref.fasta 1.0 MB
BWA_ref.fasta.amb 12.7 kB
BWA_ref.fasta.ann 16.3 kB
BWA_ref.fasta.bwt 1.0 MB
BWA_ref.fasta.pac 258.3 kB
BWA_ref.fasta.sa 516.6 kB
lobSTR_chromsizes.tab 7.0 kB
lobSTR_mergedref.bed 16/2 kB
lobSTR_mergedref.targets.bed 10.0 kB
lobSTR_ref.fasta 1.0 mB
lobSTR_ref_map.tab 14.3 kB
strinfo.tab 28.8 kB

@lkistler
Copy link
Owner

lkistler commented Apr 14, 2020 via email

@michaeldonaldson
Copy link
Author

Hi Logan,

I just noticed the fasta file does have those headers. Something went wrong when I pasted them into this forum, sorry.

Yes, there are significant numbers of reads mapped. I'm actually running this pipeline on capture-data. We had previously designed probes to target microsatellite regions (and other specific genes of interest) and I have been having trouble creating genotypes from capture data (amplicon sequencing is rather straight-forward when you have known primers and filler-regions). However, in this case we used probes from a species with a genome to target regions in a species without a target genome. So I thought I'd give your pipeline a go. Our capture libraries followed the Roche EZ-developer protocols so the libraries were sheared, indexed, pooled, captured, and amplified, if I recall correctly.

I don't see the attached script?

Thank you!

@lkistler
Copy link
Owner

lkistler commented Apr 15, 2020 via email

@michaeldonaldson
Copy link
Author

Hi Logan,

Thanks for the advice, I am looking into those possibilities as well.

Sorry to be a bother, but here's the tail end of the output that resulted in another error using the script you provided. Any advice would be appreciated!

[main] CMD: bwa mem -aM -R @rg\tID:lobSTR;sample_S100;lib_S100\tLB:lib_S100\tSM:sample_S100 contig.str.lobSTRindex/BWA_ref.fasta /media/user/fastq/100_R1.fq /media/user/fastq/100_R2.fq
[main] Real time: 25.158 sec; CPU: 25.948 sec
[bam_rmdup_core] processing reference Block6...
[bam_rmdup_core] processing reference Block12...
[bam_rmdup_core] processing reference Block6...
[bam_rmdup_core] processing reference Block11...
[bam_rmdup_core] processing reference Block6...
[bam_rmdup_core] processing reference Block11...
[bam_rmdup_core] processing reference Block6...
[bam_rmdup_core] processing reference Block11...
[bam_rmdup_core] processing reference Block6...
[bam_rmdup_core] processing reference Block11...
[bam_rmdup_core] processing reference Block6...
[bam_rmdup_core] processing reference Block11...
[bam_rmdupse_core] 2098 / 4342 = 0.4832 in library 'lib_S100'
[W::bam_merge_core2] No @hd tag found.
[E::hts_idx_push] Chromosome blocks not continuous
samtools index: failed to create index for "run1.sample_S100.aligned.bam"

CMD: /home/user/workspace/SNP_caller/tools/lobSTR-bin-Linux-x86_64-4.0.6/bin/allelotype --command classify --strinfo contig.str.lobSTRindex/strinfo.tab --out run1 --index-prefix contig.str.lobSTRindex/lobSTR_ --regions contig.str.lobSTRindex/lobSTR_mergedref.targets.bed --realign --filter-clipped --min-read-end-match 10 --filter-mapq0 --max-repeats-in-ends 3 --no-rmdup --noise_model run1.noisetmp --bam run1.sample_S100.aligned.bam

[allelotype-4.0.6] 2020-04-15.16:57:31 ProgressMeter: Getting run info
[allelotype-4.0.6] 2020-04-15.16:57:31 ERROR: Could not open index files
[allelotype-4.0.6] 2020-04-15.16:57:31 ProgressMeter: Outputting run statistics

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants