Skip to content

Releases: kundajelab/tfmodisco

Actually fix corresponding to 0.5.16.4

28 Jan 22:03
Compare
Choose a tag to compare
Pre-release

Fix in https://github.com/kundajelab/tfmodisco/releases/tag/v0.5.16.4 reported to not work. Should have done min(perplexity, matrix.shape[0]-1) rather than min(perplexity, matrix.shape[0]). New fix in light of message on #112

Fixing "perplexity must be less than n_samples"

27 Jan 01:45
Compare
Choose a tag to compare

Reported in issue #112

Bug was caused by the addition of a feature added in a later release (subclustering within motifs and visualization of the subclusters using t-sne). Fix is to put in a check to reduce the perplexity relative to the default if the number of seqlets in the motif is less than the default perplexity.

Fix tsne sparse input matrix error

18 May 22:19
01a92d0
Compare
Choose a tag to compare
Pre-release

Corresponds to PR #108 by @akmorrow13

Bringing down Leiden memory use - patch 1

27 Jan 05:43
c1cbf7c
Compare
Choose a tag to compare

Corresponds to PR #99

  • Removed some tolist() commands that might have been contributing to memory explosion
  • More detailed printouts of memory usage
  • Made it possible to specify a different number of parallel runs for the main clustering step via the n_cores_mainclustering argument to TfModiscoSeqletsToPatternsFactory

Lower mem for agkm embeddings, pynnd option for coarse affmat

29 Nov 04:30
b136c20
Compare
Choose a tag to compare
  • Added pynnd=True option to use pynn descent for coarse-grained affinity matrix computation (caveat: runs into a weird pickling error on Colab: lmcinnes/pynndescent#133)
  • Noticed that storing the agkm embeddings as [(agkm_string_representation, value), ...] seemed to take up a lot of space (possibly because representing the agkms as strings is space-consuming? So now they get converted to [(agkm_idx, value)...] before being stored. This seems to bring down the memory consumption.
  • Other minor changes pertaining to reporting some internal hit-scoring-related metrics (exclude_self excludes the self when benchmarking how well the fann_perclass (finegrained-affinity nearest-neighbors) method works for recovering the true class for motif hits, since the fine-grained affinity to the self is always 1; also added benchmarking of how well simply using aggregate similarity works)
  • Also did some reorganization of example notebooks that I mainly use to test out stuff - put some of the more experimental notebooks under "examples/simulated_TAL_GATA_deeplearning/other"
  • Updating Leiden version to avoid the segfault bug (vtraag/leidenalg#68)

Added CircleCI continuous integration

29 Nov 07:05
cb2ec8e
Compare
Choose a tag to compare
Pre-release

Corresponds to PR #98

  • Added circleci continuous integration
  • Removed the .travis.yml
  • Bumped the version from 0.5.16.0->0.5.16.1
  • Added a badge to the github readme
  • No tfmodisco code changes

Improvements to hit scoring

20 Aug 21:22
7de50c1
Compare
Choose a tag to compare
Pre-release

Corresponds to PR #94

Emphasis is now given to the core seqlet region when figuring out which motif the seqlet aligns to, such that the presence of alternative motifs in the flanks can't change the motif assignment. Also revamped how the fine-grained affinities are calculated (there is no core-grained calculation step; I just first align the core seqlet to the aggregate pattern, and then use that alignment to compute the fine-grained similarities to the constituent seqlets in the pattern; it seems substantially faster)

Also improved the seqlet identification method:

  • I switched to FixedWindow seqlet identification method, which is the same one used during the main modisco run, because the VariableWindow method was resulting in a lot of windows that were "tied" at an FDR of 0, even though some windows were much more high-scoring than others.
  • I made a small tweak to improve the overlap exclusion (need to exclude core_window_size-0.5 on either side...the reason is just a really involved detail to do with indexing math)
  • Put in a feature to allow for only returning postive-scoring seqlets, is on by default.

Also, with the Leiden bugfix, I'm back to using movenodes in the refine partition

Also, the hits now return trim_start and trim_end, which are the start/end of the trimmed pattern (the user can specify the IC; default 0.3)

Added final flank expansion functionality back in

12 Jul 13:15
77e89d0
Compare
Choose a tag to compare

Corresponds to PR #93; version 0.5.14.0 accidentally removed the final motif flank expansion that was controlled by the parameter "final_flank_to_add", such that the flank expansion was effectively 0 (note: this only affected the flank expansion that was done at the very end of the tfmodisco pipeline; there is still flank expansion controlled by the parameter "initial_flank_to_add"). I added the functionality for final flank expansion back in, and for backwards compatibility with version 0.5.14.0 I have set the default value of final_flank_to_add to be 0 (previously, it was 10; a default of 0 is actually probably better from a user perspective, because sometimes users run tf-modisco on very short sequences, and having a large final_flank_to_add can cause many seqlets to get discarded when the expansion extends beyond the end of the sequence). I also cleaned up some of the notebooks.

Nicer API for density-adaptive hit scoring

29 Apr 17:46
d3985db
Compare
Choose a tag to compare

New spurious merging detection, on-the-fly flank filling, exploring different merging criteria, density-adapted hit-scoring core code

15 Apr 21:12
e4028e1
Compare
Choose a tag to compare