Skip to content

Improvements to hit scoring

Pre-release
Pre-release
Compare
Choose a tag to compare
@AvantiShri AvantiShri released this 20 Aug 21:22
· 49 commits to master since this release
7de50c1

Corresponds to PR #94

Emphasis is now given to the core seqlet region when figuring out which motif the seqlet aligns to, such that the presence of alternative motifs in the flanks can't change the motif assignment. Also revamped how the fine-grained affinities are calculated (there is no core-grained calculation step; I just first align the core seqlet to the aggregate pattern, and then use that alignment to compute the fine-grained similarities to the constituent seqlets in the pattern; it seems substantially faster)

Also improved the seqlet identification method:

  • I switched to FixedWindow seqlet identification method, which is the same one used during the main modisco run, because the VariableWindow method was resulting in a lot of windows that were "tied" at an FDR of 0, even though some windows were much more high-scoring than others.
  • I made a small tweak to improve the overlap exclusion (need to exclude core_window_size-0.5 on either side...the reason is just a really involved detail to do with indexing math)
  • Put in a feature to allow for only returning postive-scoring seqlets, is on by default.

Also, with the Leiden bugfix, I'm back to using movenodes in the refine partition

Also, the hits now return trim_start and trim_end, which are the start/end of the trimmed pattern (the user can specify the IC; default 0.3)