Skip to content

v0.2.2

Compare
Choose a tag to compare
@zktuong zktuong released this 27 Jun 11:57
· 357 commits to master since this release
801f5bd

What's Changed

Bug fixes and Improvements

  • Speed up generate_network
    • pair-wise hamming distance is calculated on per clone/clonotype only if more than 1 cell is assigned to a clone/clonotype
    • .distance slot is removed and is now directly stored/converted from the .graph slot.
    • new options:
      • compute_layout: bool = True. If dataset is too large, generate_layout can be switched to False in which case only the networkx graph is returned. The data can still be visualised later with scirpy's plotting method (see below).
      • layout_method: Literal['sfdp', 'mod_fr'] = 'sfdp'. New default uses the ultra-fast C++ implemented sfdp_layout algorithm in graph-tools to generate final layout. sfdp stands for Scalable Force Directed Placement.
        • Minor caveat is that the repulsion is not as good - when there's a lot of singleton nodes, they don't separate well unless you some how work out which of the parameters in sfdp_layout to tweak will produce an effective separate. changing gamma alone doesn't really seem to do much.
        • The original layout can still be generated by specifying layout_method = 'mod_fr'. Requires a separate installation of graph-tool via conda (not managed by pip) as it has several C++ dependencies.
        • pytest on macos may also stall because of a different backend being called - this is solved by changing tests that calls generate_network to run last.
    • added steps to reduce memory hogging.
    • min_size was doing the opposite previously and this is now fixed. #155
  • Speed up transfer
    • Found a faster way to create the connectivity matrix.
    • this also now transfer a dictionary that scirpy can use to generate the plots scverse/scirpy#286
    • Fix #153
      • rename productive to productive_status.
  • Fix #154
    • reorder the if-else statements.
  • Speed up filter_contigs
    • tree construction is simplified and replaced for-loops with dictionary updates.
  • Speed up initialise_metadata. Dandelion should now initialise and read faster.
    • Removed an unnecessary data sanitization step when loading data.
    • Now load_data will rename umi_count to duplicate_count
    • Speed up Query
      • tree construction is simplified and replaced for-loops with dictionary updates.
      • didn't need to use an airr validator as that slows things down.
  • data initialised by Dandelion will be ordered based on productive first, then followed by umi count (largest to smallest).

Breaking Changes

  • initialise_metadata/update_metadata/Dandelion
    • For-loops to initialise the object has veen vectorized, resulting in a minor speed uprade
    • This results in reduction of some columns in the .metadata which were probably bloated and not used.
      • vdj_status and vdj_status_summary removed and replaced with rearrangement_VDJ_status and rearrange_VJ_status
      • constant_status and constant_summary removed and replaced with constant_VDJ_status and constant_VJ_status.
      • productive and productive_summary combined and replaced with productive_status.
      • locus_status and locus_status_summary combined and replaced with locus_status.
      • isotype_summary replaced with isotype_status.
  • where there was previously unassigned or '' has been changed to :str: None in .metadata.
    • Not changed to NoneType as there's quite a bit of text processing internally that gets messed up if swapped.
    • No_contig will still be populated after transfer to AnnData to reflect cells with no TCR/BCR info.
  • deprecate use of nxviz<0.7.4

Minor changes

  • Rename and deprecate read_h5/write_h5. Use of read_h5ddl/write_h5ddl will be enforced in the next update.

Full Changelog: v0.2.1...v0.2.2