Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarking basic MHC (8haplos): build and quasimap #125

Open
iqbal-lab opened this issue Jun 3, 2018 · 0 comments
Open

Benchmarking basic MHC (8haplos): build and quasimap #125

iqbal-lab opened this issue Jun 3, 2018 · 0 comments
Assignees

Comments

@iqbal-lab
Copy link
Collaborator

iqbal-lab commented Jun 3, 2018

The human genome is 3billion bases and the MHC is 5Mb long.
This PRG just contains the 8 reference MHC haplotypes, but no other variation - so 99.8% of the genome has no variation.
Final number in PRG alphabet is 23690.

gramtools version

  {
    "version_number": "0.5.0",
    "last_git_commit_hash": "d8a3082a921579e65081fa1932c42c4f2fb7953a",
    "truncated_git_commits": [
        "d8a3082 - Robyn Ffrancon, 1527688551 : enhancement: build command optionally skips building PRG",
        "2dac562 - Robyn Ffrancon, 1527601335 : enhancement: quasimap commands ensures that build command executed successfully",
        "760b759 - Robyn Ffrancon, 1527599820 : enhancement: build stops and returns non-zero if no variants sites found in prg",
        "f3b8cff - Robyn Ffrancon, 1527597315 : enhancment: removed unused skip optimisation code",
        "e22cd4f - Robyn Ffrancon, 1527590325 : fix: SA indexes associated with correct site-allele paths for allele encapsulated mappings"
    ]
}

Build benchmarks

I'll start on the cluster, which involves using shared machines, but my benchmarking machine is totally blocked benchmarking p. falciparum

kmer CPUs encode PRG (sec) generate FM index (sec) masks (sec) Total human experienced time kmer index (sec) max RAM
5 1 1.4 105.5 74 20 3 mins 350Mb
7 1 4 144 85 350 10mins 374Mb
9 1 1 109 71 45 4 mins 400Mb

Quasimap benchmarks

The vast majority of reads (99.8%) are irrelevant, and will be discarded immediately because they don't hit the kmer index.
Mapping a huge fastq of NA12878 reads ...~ 747.5 million reads.

kmer CPUs Load data (sec) Quasimap (sec) Human exp time Reads/sec/CPU Mapped reads Mem Comments
5 1 37 ? ? ? ? ? ?
7 1 37 ? ? ? ? ? ?
9 1 28 30279 8 hrs 43 mins 24686 154141 1.8Mb untrimmed reads
9 1 39 104358 29 hours 7162 1.8Mb trimmed reads
@iqbal-lab iqbal-lab self-assigned this Jun 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant