Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use *.bam intermediary files rather than *.sam #763

Open
tomkinsc opened this issue Jan 25, 2018 · 2 comments
Open

use *.bam intermediary files rather than *.sam #763

tomkinsc opened this issue Jan 25, 2018 · 2 comments

Comments

@tomkinsc
Copy link
Member

In a few places we use .sam intermediary files where we could use .bam files. The latter take a bit more IO/CPU time with the advantage of better compression ratio. One such instance is here:
https://github.com/broadinstitute/viral-ngs/blob/master/tools/bwa.py#L228

We should consider switching these occurrences across the codebase to use .bam by pipling to samtools with the -b flag.

@dpark01 dpark01 added this to Backlog (not for this release) in v1.19.1 Jan 25, 2018
@dpark01 dpark01 added this to Backlog (not necessarily this release) in v1.19.2 Jan 29, 2018
@yesimon
Copy link
Contributor

yesimon commented Mar 8, 2019

At this stage we can probably use .cram files.

@dpark01
Copy link
Member

dpark01 commented Mar 8, 2019

I think this issue was mostly about the ephemeral intermediates, not anything we present to the outside world. In that regard, the only reason to use bam over sam is so that we don't always require a VM instance to have a ton of local disk space for handling large sets of reads (say, from big sequencers). But going for cram is probably cpu-overkill on a file that we're just going to delete anyway. In fact, for this particular issue, I was thinking that we should just be using the -1 compression level flag on samtools, which optimizes for speed (you really don't want this part to be the bottleneck) while reducing unnecessary wastage on the local temp disk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
v1.19.1
  
Backlog (not for this release)
v1.19.2
  
Backlog (not necessarily this release)
Development

No branches or pull requests

3 participants