Multi-threading improvements #144

bricoletc · 2019-04-11T14:14:52Z

Following Quasimapping with multiple threads odd behaviour #127 , here is the speedup obtained on quasimap on different threads:

The black line is the theoretical best speedup (= number of threads).

Dataset: 250,000 reads (so 500,000 with the reverse complements) from /nfs/leia/research/iqbal/bletcher/Pf_benchmark/all_reads/_original_/PG0496-C.bam mapped to pf3k prg build from /nfs/research1/zi/bletcher/Pf_benchmark/ref_data
(yoda cluster)

Command:
bsub -R select[mem>60000] rusage[mem=60000] -M60000 -J threads_4 -n 4 -o /nfs/leia/research/iqbal/bletcher/Pf_benchmark/tests/threads/logs/t4_k8.o \ -e /nfs/leia/research/iqbal/bletcher/Pf_benchmark/tests/threads/logs/t4_k8.e singularity exec /nfs/leia/research/iqbal/bletcher/Singularity_Images/8b46a86_gramtools.img gramtools quasimap \ --gram-dir /nfs/leia/research/iqbal/bletcher/Pf_benchmark/tests/gram_k8 --run-dir /nfs/leia/research/iqbal/bletcher/Pf_benchmark/tests/threads/run_t4 \ --reads /nfs/leia/research/iqbal/bletcher/Pf_benchmark/tests/subsetted_reads/PG0496-C.trim.fq.1.1MSubset.gz --max-threads 4
Commit 8b46a86

The text was updated successfully, but these errors were encountered:

bricoletc · 2019-04-15T16:04:37Z

@leoisl has noticed one (probably) bug and one possible improvement:

[Bug] At https://github.com/iqbal-lab-org/gramtools/blob/master/libgramtools/src/quasimap/coverage/grouped_allele_counts.cpp#L52 :
when an allele combination has not been seen before, an entry is created in the site_coverage map assigning counts to allele 'equivalence classes' (for a given variant site). If two different threads (processing two different read mappings to the graph) try to both do this, the program would crash.

Pre-allocating entries to 0 count does not seem reasonable since the number of equivalence classes is the power set of the number of alleles in a variant site.

[Improvement] Scheduling. Right now, each thread gets just one read. We could give each thread a chunk of reads (eg 1K). This could be more efficient than having threads constantly being provided with single reads at a time.

bricoletc · 2019-05-08T09:58:04Z

In response to this:

@leoisl has noticed one (probably) bug and one possible improvement:

[Bug] At https://github.com/iqbal-lab-org/gramtools/blob/master/libgramtools/src/quasimap/coverage/grouped_allele_counts.cpp#L52 :
when an allele combination has not been seen before, an entry is created in the site_coverage map assigning counts to allele 'equivalence classes' (for a given variant site). If two different threads (processing two different read mappings to the graph) try to both do this, the program would crash.

I have gotten two seg faults from ~100 separate instances of multi-threaded quasimapping to Plasmodium prg.

Here is the stack trace from gdb of the core dump of one of them (I do this from inside singularity container:/nfs/leia/research/iqbal/bletcher/Pf_benchmark/full_Cross/04ff58e_prod.img in yoda) :

gdb /usr/local/lib/python3.6/dist-packages/gramtools/bin/gram core.30852

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00005647921b3ffe in std::_Hashtable<std::vector<unsigned int, std::allocator<unsigned int> >, std::pair<std::vector<unsigned int, std::allocator<unsigned int> > const, unsigned long>, std::allocator<std::pair<std::vector<unsigned int, std::allocator<unsigned int> > const, unsigned long> >, std::__detail::_Select1st, std::equal_to<std::vector<unsigned int, std::allocator<unsigned int> > >, gram::sequence_hash<std::vector<unsigned int, std::allocator<unsigned int> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_find_before_node (__code=3864292196, __k=..., __n=2, this=0x2b797278fcd8)
    at /usr/include/c++/7/bits/hashtable.h:1548
1548	      for (__node_type* __p = static_cast<__node_type*>(__prev_p->_M_nxt);;

The core is also at /nfs/leia/research/iqbal/bletcher/Pf_benchmark/full_Cross/

And then:

(gdb) where
#0  0x00005647921b3ffe in std::_Hashtable<std::vector<unsigned int, std::allocator<unsigned int> >, std::pair<std::vector<unsigned int, std::allocator<unsigned int> > const, unsigned long>, std::allocator<std::pair<std::vector<unsigned int, std::allocator<unsigned int> > const, unsigned long> >, std::__detail::_Select1st, std::equal_to<std::vector<unsigned int, std::allocator<unsigned int> > >, gram::sequence_hash<std::vector<unsigned int, std::allocator<unsigned int> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_find_before_node (__code=3864292196, __k=std::vector of length 1, capacity 1 = {...}, __n=2, this=0x2b797278fcd8)
    at /usr/include/c++/7/bits/hashtable.h:1548
#1  std::_Hashtable<std::vector<unsigned int, std::allocator<unsigned int> >, std::pair<std::vector<unsigned int, std::allocator<unsigned int> > const, unsigned long>, std::allocator<std::pair<std::vector<unsigned int, std::allocator<unsigned int> > const, unsigned long> >, std::__detail::_Select1st, std::equal_to<std::vector<unsigned int, std::allocator<unsigned int> > >, gram::sequence_hash<std::vector<unsigned int, std::allocator<unsigned int> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_find_node (__c=3864292196, __key=std::vector of length 1, capacity 1 = {...}, __bkt=2, this=0x2b797278fcd8)
    at /usr/include/c++/7/bits/hashtable.h:642
#2  std::__detail::_Map_base<std::vector<unsigned int, std::allocator<unsigned int> >, std::pair<std::vector<unsigned int, std::allocator<unsigned int> > const, unsigned long>, std::allocator<std::pair<std::vector<unsigned int, std::allocator<unsigned int> > const, unsigned long> >, std::__detail::_Select1st, std::equal_to<std::vector<unsigned int, std::allocator<unsigned int> > >, gram::sequence_hash<std::vector<unsigned int, std::allocator<unsigned int> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true>, true>::operator[] (this=0x2b797278fcd8, __k=std::vector of length 1, capacity 1 = {...})
    at /usr/include/c++/7/bits/hashtable_policy.h:721
#3  0x00005647921b3488 in std::unordered_map<std::vector<unsigned int, std::allocator<unsigned int> >, unsigned long, gram::sequence_hash<std::vector<unsigned int, std::allocator<unsigned int> > >, std::equal_to<std::vector<unsigned int, std::allocator<unsigned int> > >, std::allocator<std::pair<std::vector<unsigned int, std::allocator<unsigned int> > const, unsigned long> > >::operator[] (__k=std::vector of length 1, capacity 1 = {...}, this=<optimized out>) at /usr/include/c++/7/bits/unordered_map.h:973
#4  gram::coverage::record::grouped_allele_counts (coverage=..., search_states=std::__cxx11::list = {...})
    at /tmp/pip-44dd3srs-build/libgramtools/src/quasimap/coverage/grouped_allele_counts.cpp:54
#5  0x00005647921a6be6 in gram::coverage::record::search_states (coverage=..., search_states=..., read_length=@0x2b7973395d08: 100, prg_info=..., random_seed=<optimized out>)
    at /tmp/pip-44dd3srs-build/libgramtools/src/quasimap/coverage/common.cpp:212
#6  0x000056479219bfa4 in gram::quasimap_read (read=std::vector of length 100, capacity 100 = {...}, coverage=..., kmer_index=std::unordered_map with 3376442 elements = {...}, 
    prg_info=..., parameters=...) at /tmp/pip-44dd3srs-build/libgramtools/src/quasimap/quasimap.cpp:211
#7  0x000056479219c34d in handle_reads_buffer () at /tmp/pip-44dd3srs-build/libgramtools/src/quasimap/quasimap.cpp:181
#8  0x00002b792f4f995e in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#9  0x00002b792f9316db in start_thread (arg=0x2b7973396700) at pthread_create.c:463
#10 0x00002b792fc6a88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

From ::operator[] inside gram::coverage::record::grouped_allele_counts this looks exactly like @leoisl 's bug find..

bricoletc · 2019-05-10T10:19:32Z

Here is what happens under bricoletc@e66865e ; the instruction #pragma omp atomic is changed to #pragma omp critical in libgramtools/src/quasimap/coverage/grouped_allele_counts.cpp, now threads cannot update grouped coverage simulateneously.

Some samples take much more time: 803, PG0459-C. This could be followed up.
Sample PG0463-C is where a segfault occurred prior to change, hence low CPU run time.

There is an overall increase of ~5000 CPU seconds, which amounts to about 1h increased run time per quasimapping, compared to overall ~15hours. I have not seen segfaults yet, so this might be a good solution.
We could do better by locking threads on each site's coverage structure; right now any coverage write is locked for all sites

leoisl · 2019-05-10T12:39:19Z

Hello,

wow, pretty nice and detailed benchmarking! I guess this should be addressed in the future, but for now it is important that your experiments aren't blocked and the slow down is small, so the tool is runnable. We should discuss quasimapping and how to improve mapping speed, together with this and possibly other stuff like scheduling policies, bloom filters filtering, etc...

bricoletc added bug enhancement labels Apr 15, 2019

bricoletc pinned this issue May 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-threading improvements #144

Multi-threading improvements #144

bricoletc commented Apr 11, 2019

bricoletc commented Apr 15, 2019

bricoletc commented May 8, 2019

bricoletc commented May 10, 2019

leoisl commented May 10, 2019

Multi-threading improvements #144

Multi-threading improvements #144

Comments

bricoletc commented Apr 11, 2019

bricoletc commented Apr 15, 2019

bricoletc commented May 8, 2019

bricoletc commented May 10, 2019

leoisl commented May 10, 2019