-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add option -t for turning off translation; add option -p and -i for samtools merge #369
base: develop
Are you sure you want to change the base?
Conversation
ionzhangj8
commented
Mar 25, 2015
- Added option -t to turn on/off translation in samtools merge. By default translation is on. If multiple input bam fils contain same RG name or PG name, turning on translation will append some string to RG name or PG name. As a result, if we run the same merge multiple times we'll get different size of output bam files. By turning off translation (using -t0 at command line) will give us identical output bam files, which will behave as samtools-0.1.19.
- Added option -p for multithreading bam loading in samtools merge. By default there is no extra thread for bam loading. -pN will create N extra threads for bam loading and speed up the process of samtools merge.
- Add option -i for creating indexing file in samtools merge. When we run samtools merge, if we use option -i [output], the indexing file [output.bai] will be created at the end of merge. Because the data is already in the memory, creating indexing in merge will save us time on running samtools index. Mainly we can save the time of loading the merged bam file for indexing.
…eading bam read in samtools merge and -i for creating indexing file in samtools merge
Hi,
|
The required htslib change is here: |
|
The reason we don't generate an identical BAM file to samtools 0.1.19 is that this version of samtools can generate an invalid BAM file. Specifically because this version ignores the content of the headers, it can generate from valid input files an invalid output file where records exist without an @rg header for the RG:Z: they refers to (likewise to @pg) because the @rg record for these files was only in the the 2nd or 3rd file input. Could you provide a use case where such would be useful to help me understand? |
We're using samtools 0.1.19. After we switch to samtools 1.2 some of our validation may fail because of using simple diff between old and new merged bam files. I may take option -t out and just use it for our internal purpose. |
I checked in my fix for indentation in both htslib and samtools. I also removed my option -t from samtools merge. |
A change was made to samtools/htslib#193, file htslib/hts.c for handling the case "samtools merge -i /dev/stdout ...". |