Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add option -t for turning off translation; add option -p and -i for samtools merge #369

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

ionzhangj8
Copy link

  1. Added option -t to turn on/off translation in samtools merge. By default translation is on. If multiple input bam fils contain same RG name or PG name, turning on translation will append some string to RG name or PG name. As a result, if we run the same merge multiple times we'll get different size of output bam files. By turning off translation (using -t0 at command line) will give us identical output bam files, which will behave as samtools-0.1.19.
  2. Added option -p for multithreading bam loading in samtools merge. By default there is no extra thread for bam loading. -pN will create N extra threads for bam loading and speed up the process of samtools merge.
  3. Add option -i for creating indexing file in samtools merge. When we run samtools merge, if we use option -i [output], the indexing file [output.bai] will be created at the end of merge. Because the data is already in the memory, creating indexing in merge will save us time on running samtools index. Mainly we can save the time of loading the merged bam file for indexing.

…eading bam read in samtools merge and -i for creating indexing file in samtools merge
@mp15
Copy link
Member

mp15 commented Mar 25, 2015

Hi,
Got a few questions and comments:

  1. What does -t do that -cp doesn't?
  2. Could you fix up the indentation? It will make reviewing the patch easier.
  3. It seems there is htslib changes that this depends upon? Could you list the pull request for these here please?

@domibel
Copy link

domibel commented Mar 26, 2015

The required htslib change is here:
samtools/htslib#193

@ionzhangj8
Copy link
Author

  1. option -cp generates same merged bam file for multiple runs. But it does not generate the identical merged bam file as samtools-0.1.19. With option -t0 the merged bam is identical to samtools-0.1.19 result.
  2. I'll fix up the indentation.
  3. Thanks to domibel, the dependency of htslib is for creating indexing file in samtools merge htslib#193.

@mp15
Copy link
Member

mp15 commented Mar 26, 2015

The reason we don't generate an identical BAM file to samtools 0.1.19 is that this version of samtools can generate an invalid BAM file. Specifically because this version ignores the content of the headers, it can generate from valid input files an invalid output file where records exist without an @rg header for the RG:Z: they refers to (likewise to @pg) because the @rg record for these files was only in the the 2nd or 3rd file input. Could you provide a use case where such would be useful to help me understand?

@ionzhangj8
Copy link
Author

We're using samtools 0.1.19. After we switch to samtools 1.2 some of our validation may fail because of using simple diff between old and new merged bam files. I may take option -t out and just use it for our internal purpose.
Thanks.

@ionzhangj8
Copy link
Author

I checked in my fix for indentation in both htslib and samtools. I also removed my option -t from samtools merge.
Build of my samtools depends on my checked in htslib.

@ionzhangj8
Copy link
Author

A change was made to samtools/htslib#193, file htslib/hts.c for handling the case "samtools merge -i /dev/stdout ...".

@jmarshall jmarshall added this to the wishlist milestone Jul 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants