Skip to content

Releases: zktuong/dandelion

v0.3.5

01 Feb 04:54
Compare
Choose a tag to compare

What's Changed

  • fix container script bug by @zktuong in #350
  • reorder content table on docs by @zktuong in #352
  • fix entry of anndata with NaN 'sequence_id' values by @amoschoomy in #351
  • add dependabot dependency review for PR by @zktuong in #353
  • add else statement to check contigs when there's no sequence by @MeganS92 in #354
  • pip prod(deps): update pandas requirement from <=2.1.4,>=1.0.3 to >=1.0.3,<=2.2.0 by @dependabot in #355
  • pip dev(deps-dev): update sphinx-autodoc-typehints requirement from <=1.25.2 to <=1.25.3 by @dependabot in #356
  • pip dev(deps-dev): update scirpy requirement from <=0.14 to <=0.15.0 by @dependabot in #357
  • convert to use umi_count by @zktuong in #358

New Contributors

Full Changelog: v0.3.4...v0.3.5

v0.3.4

10 Jan 00:17
97dc56a
Compare
Choose a tag to compare

Summary

  • Speed up network generation in generate_network
  • Add soft filtering and normalisation to vdj_psuedobulk functions - @ktpolanski
  • Created a new column in .data (extra) to flag if contig is considered extra.
  • New clone id definition to insert VDJ and VJ to the id to reduce ambiguity - need to check if it does it properly for cells with no clone ids. This also means that now clone ids can be created for orphan chains.
  • New to_scirpy/from_scirpy functions that will now convert them to the new scverse airr formats - @amoschoomy
  • Container build is now simplified and uses mamba to manage all the dependencies.
  • New option to run preprocessing with ogrdb references in both the base package and the container.
  • New reference download function in the container folder to ensure the latest references are pulled for every new iteration of the container.
  • Deprecate support for python3.7 tests.

What's Changed

dependabot updates

  • pip prod(deps): update pandas requirement from <=2.1.0,>=1.0.3 to >=1.0.3,<=2.1.1 by @dependabot in #314
  • pip dev(deps-dev): update readthedocs-sphinx-ext requirement from <=2.2.2 to <=2.2.3 by @dependabot in #318
  • pip prod(deps): update pandas requirement from <=2.1.1,>=1.0.3 to >=1.0.3,<=2.1.2 by @dependabot in #324
  • pip dev(deps-dev): update sphinx-autodoc-typehints requirement from <=1.24.0 to <=1.24.1 by @dependabot in #331
  • pip prod(deps): update pandas requirement from <=2.1.2,>=1.0.3 to >=1.0.3,<=2.1.3 by @dependabot in #332
  • pip dev(deps-dev): update sphinx-autodoc-typehints requirement from <=1.24.1 to <=1.25.2 by @dependabot in #333
  • pip dev(deps-dev): update sphinx-rtd-theme requirement from <=1.2.2 to <=2.0.0 by @dependabot in #338
  • pip prod(deps): update pandas requirement from <=2.1.3,>=1.0.3 to >=1.0.3,<=2.1.4 by @dependabot in #339
  • pip dev(deps-dev): update readthedocs-sphinx-ext requirement from <=2.2.3 to <=2.2.4 by @dependabot in #340
  • pip dev(deps-dev): update readthedocs-sphinx-ext requirement from <=2.2.4 to <=2.2.5 by @dependabot in #345
  • Bump tj-actions/changed-files from 35 to 41 in /.github/workflows by @dependabot in #347

New Contributors

Full Changelog: v0.3.3...v0.3.4

v0.3.3

12 Sep 10:37
b0a094c
Compare
Choose a tag to compare

What's Changed

  • Mainly updates and bug fixes to tl.clone_overlap and pl.clone_overlap.
  • simplified pre-processing functions to call command line arguments instead of running within the code.

Detailed notes:

  • Update docs for clone overlap by @zktuong in #276
  • Allow additional arguments in define_clones by @zktuong in #280
  • add an if statement to check if actor is dependabot by @zktuong in #289
  • pip dev(deps-dev): update sphinx-autodoc-typehints requirement from <=1.23.0 to <=1.23.3 by @dependabot in #284
  • pip dev(deps-dev): update sphinx-rtd-theme requirement from <=1.2.0 to <=1.2.2 by @dependabot in #285
  • pip dev(deps-dev): update readthedocs-sphinx-ext requirement from <=2.2.0 to <=2.2.2 by @dependabot in #286
  • pip dev(deps-dev): update nbsphinx requirement from <=0.9.1 to <=0.9.2 by @dependabot in #287
  • enable auto-merge for dependabot by @zktuong in #290
  • refactoring how external scripts and locations are called by @zktuong in #288
  • fix reassign_alleles by @zktuong in #293
  • remove deprecated function from docs by @zktuong in #297
  • pip dev(deps-dev): update sphinx-autodoc-typehints requirement from <=1.23.3 to <=1.24.0 by @dependabot in #296
  • fix weekly tests by @zktuong in #301
  • pip prod(deps): update mizani requirement from <0.10.0 to <0.11.0 by @dependabot in #302
  • add options to plotting clone overlap by @zktuong in #307
  • add requirements.txt by @zktuong in #309
  • should be cartesian product instead of combination by @zktuong in #312

Full Changelog: v0.3.2...v0.3.3

v0.3.2

29 May 00:10
19eaa63
Compare
Choose a tag to compare

What's Changed

Mainly to fix compatibility with dependencies.

New Contributors

Full Changelog: v0.3.1...v0.3.2

v0.3.1

06 Feb 11:29
878c1a0
Compare
Choose a tag to compare

What's Changed

Just to update pypi - Some bug fixes to accompany the revision
Doesn't affect the container image (but i should add a tag on sylabs to also call it 0.3.1 just to be consisten).

Full Changelog: v0.3.0...v0.3.1

v0.3.0

09 Nov 14:07
Compare
Choose a tag to compare

What's Changed

This release adds a number of new features and minor restructuring to accompany Dandelion's manuscript (uploading soon). Kudos to @suochenqu and @ktpolanski

  1. data strategy to handle non-productive contigs, partial contigs and 'J multi-mappers'
  2. new V(D)J pseudotime trajectory inference!
  3. revamped tutorials and documents

Detailed PRs

New Contributors

Full Changelog: v0.2.4...v0.3.0

v0.2.4

07 Jul 15:32
69ae2e2
Compare
Choose a tag to compare

What's Changed

New features

slicing functionality

  • the Dandelion object can now be sliced like a AnnData, or pandas DataFrame!
    vdj[vdj.data['productive'] == 'T']
    Dandelion class object with n_obs = 38 and n_contigs = 94
        data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'rearrangement_status'
        metadata: 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'v_call_abT_VDJ', 'd_call_abT_VDJ', 'j_call_abT_VDJ', 'v_call_abT_VJ', 'j_call_abT_VJ', 'productive_abT_VDJ', 'productive_abT_VJ', 'v_call_gdT_VDJ', 'd_call_gdT_VDJ', 'j_call_gdT_VDJ', 'v_call_gdT_VJ', 'j_call_gdT_VJ', 'productive_gdT_VDJ', 'productive_gdT_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'duplicate_count_abT_VDJ', 'duplicate_count_abT_VJ', 'duplicate_count_gdT_VDJ', 'duplicate_count_gdT_VJ', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
    vdj[vdj.metadata['productive_VDJ'] == 'T']
    Dandelion class object with n_obs = 17 and n_contigs = 36
        data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'rearrangement_status'
        metadata: 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'v_call_abT_VDJ', 'd_call_abT_VDJ', 'j_call_abT_VDJ', 'v_call_abT_VJ', 'j_call_abT_VJ', 'productive_abT_VDJ', 'productive_abT_VJ', 'v_call_gdT_VDJ', 'd_call_gdT_VDJ', 'j_call_gdT_VDJ', 'v_call_gdT_VJ', 'j_call_gdT_VJ', 'productive_gdT_VDJ', 'productive_gdT_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'duplicate_count_abT_VDJ', 'duplicate_count_abT_VJ', 'duplicate_count_gdT_VDJ', 'duplicate_count_gdT_VJ', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
    vdj[vdj.metadata_names.isin(['cell1', 'cell2', 'cell3', 'cell4', 'cell5'])]
    Dandelion class object with n_obs = 5 and n_contigs = 20
    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'rearrangement_status'
    metadata: 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'v_call_abT_VDJ', 'd_call_abT_VDJ', 'j_call_abT_VDJ', 'v_call_abT_VJ', 'j_call_abT_VJ', 'productive_abT_VDJ', 'productive_abT_VJ', 'v_call_gdT_VDJ', 'd_call_gdT_VDJ', 'j_call_gdT_VDJ', 'v_call_gdT_VJ', 'j_call_gdT_VJ', 'productive_gdT_VDJ', 'productive_gdT_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'duplicate_count_abT_VDJ', 'duplicate_count_abT_VJ', 'duplicate_count_gdT_VDJ', 'duplicate_count_gdT_VJ', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
    vdj[vdj.data_names.isin(['contig1','contig2','contig3','contig4','contig5'])]
    Dandelion class object with n_obs = 2 and n_contigs = 5
    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'junction_length', 'np1_length', 'np2_length', 'cell_id', 'c_call', 'consensus_count', 'duplicate_count', 'rearrangement_status'
    metadata: 'locus_VDJ', 'locus_VJ', 'productive_VDJ', 'productive_VJ', 'v_call_VDJ', 'd_call_VDJ', 'j_call_VDJ', 'v_call_VJ', 'j_call_VJ', 'c_call_VDJ', 'c_call_VJ', 'junction_VDJ', 'junction_VJ', 'junction_aa_VDJ', 'junction_aa_VJ', 'v_call_B_VDJ', 'd_call_B_VDJ', 'j_call_B_VDJ', 'v_call_B_VJ', 'j_call_B_VJ', 'productive_B_VDJ', 'productive_B_VJ', 'v_call_abT_VDJ', 'd_call_abT_VDJ', 'j_call_abT_VDJ', 'v_call_abT_VJ', 'j_call_abT_VJ', 'productive_abT_VDJ', 'productive_abT_VJ', 'v_call_gdT_VDJ', 'd_call_gdT_VDJ', 'j_call_gdT_VDJ', 'v_call_gdT_VJ', 'j_call_gdT_VJ', 'productive_gdT_VDJ', 'productive_gdT_VJ', 'duplicate_count_B_VDJ', 'duplicate_count_B_VJ', 'duplicate_count_abT_VDJ', 'duplicate_count_abT_VJ', 'duplicate_count_gdT_VDJ', 'duplicate_count_gdT_VJ', 'isotype', 'isotype_status', 'locus_status', 'chain_status', 'rearrangement_status_VDJ', 'rearrangement_status_VJ'
    • not sure implementing it like adata[:, adata.var.something] make sense as it's not really row information in the data slot?
    • also the base slot in Dandelion is .data, and doesn't make sense for .metadata to be the 'row'
    • maybe scverse/scirpy#327 can come up with a better strategy and i can adopt that later on.

ddl.pp.check_contigs

  • created a new function ddl.pp.check_contigs as a way to just check if contigs are ambiguous, rather than outright removing them. I envisage that this will eventually replace simple mode in ddl.pp.filter_contigs in the future.
    • new column in .data: ambiguous, T/F to indicate whether contig is considered ambiguous or not (different from cell level ambiguous).
    • the .metadata and several other functions ignores any contigs marked as T to maintain the same behaviour
    • The largest difference between ddl.pp.check_contigs and ddl.pp.filter_contigs is that the onus is on the user to remove any 'bad' cells from the GEX data (illustrated in the tutorial) with check_contigs whereas this happens semi-automatically with filter_contigs.

ddl.update_metadata now comes with a 'by_celltype' option

  • This brings a new feature - B cell, alpha-beta T cell and gamma-delta T cell associated columns for V,D,J,C and productive columns!
    • this is achieved through a new .retrieve_celltype subfunction in the Query class, which breaks up the retrieval into the 3 major groups if by_celltype = True.
    • No longer the need to guess which belongs to which and allows for easy slicing! This does cause a bit of .obs bloating.
    • Which leads to the removal of constant_status_VDJ, constant_status_VJ, productive_status_VDJ, productive_status_VJ as the metadata is getting bloated with the slight rework of Dandelion metadata slot to account for the new B/abT/gdT columns

tl.productive_ratio

  • Calculates a cell-level representation of productive vs non-productive contigs.
    • Plotting is achieved through pl.productive_ratio

tl.vj_usage_pca

  • Computes PCA on a cell-level representation of V/J gene usage across designated groupings
    • uses scanpy.pp.pca internally
    • Plotting can be achieved through scanpy.pl.pca

bug fixes

  • fix cell ordering issue scverse/scirpy#347
  • small refactor of ddl.pp.filter_contigs
    • moved some of the repetitive loops into callable functions
    • deprecate filter_vj_chains argument and replaced with filter_extra_vdj_chains and filter_extra_vj_chains to hopefully enable more interpretable behaviour. fixes #158
    • umi adjustment step was buggy but i have now made the behaviour consistent with how it functions in ddl.pp.check_contigs
  • rearrangement_status_VDJ and rearrangement_status_VJ (renamed from rearrangement_VDJ_status and rearrangement_VJ_status) from now gives a single value for whether a chimeric rearrangement occured e.g. TRDV pairing with TRAJ and TRAC as in this paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4267242/
  • fixed issues with progress bars getting out of hand
  • fixed issue with ddl.tl.find_clones crashing if more than 1 type of loci is found in the data.
    • now a B, abT and gdT prefix will be appended to BCR/TR-ab/TR-gd clones.
  • check_contigs, find_clones and define_clones were removing non-productive contigs even though there's no need to. May cause issues with filter_contigs... but there's a problem for next time.
  • fix issue with min_size in network not behaving as intended. switch to using connected components to find which nodes to trim

other changes

  • new column chain_status, to summarise the reworked locus_status column.
    • Should contain values like ambiguous, Orphan VDJ, Single pair etc, similar to chain_pairing in scirpy.
  • Also fixed the ordering of metadata to make it more presentable, instead of just randomly slotting into the...
Read more

v0.2.3

27 Jun 13:52
801f5bd
Compare
Choose a tag to compare

same as v0.2.2 but i seemed to have messed up the upload to pypi. so trying again.

What's Changed

Bug fixes and Improvements

  • Speed up generate_network
    • pair-wise hamming distance is calculated on per clone/clonotype only if more than 1 cell is assigned to a clone/clonotype
    • .distance slot is removed and is now directly stored/converted from the .graph slot.
    • new options:
      • compute_layout: bool = True. If dataset is too large, generate_layout can be switched to False in which case only the networkx graph is returned. The data can still be visualised later with scirpy's plotting method (see below).
      • layout_method: Literal['sfdp', 'mod_fr'] = 'sfdp'. New default uses the ultra-fast C++ implemented sfdp_layout algorithm in graph-tools to generate final layout. sfdp stands for Scalable Force Directed Placement.
        • Minor caveat is that the repulsion is not as good - when there's a lot of singleton nodes, they don't separate well unless you some how work out which of the parameters in sfdp_layout to tweak will produce an effective separate. changing gamma alone doesn't really seem to do much.
        • The original layout can still be generated by specifying layout_method = 'mod_fr'. Requires a separate installation of graph-tool via conda (not managed by pip) as it has several C++ dependencies.
        • pytest on macos may also stall because of a different backend being called - this is solved by changing tests that calls generate_network to run last.
    • added steps to reduce memory hogging.
    • min_size was doing the opposite previously and this is now fixed. #155
  • Speed up transfer
    • Found a faster way to create the connectivity matrix.
    • this also now transfer a dictionary that scirpy can use to generate the plots scverse/scirpy#286
    • Fix #153
      • rename productive to productive_status.
  • Fix #154
    • reorder the if-else statements.
  • Speed up filter_contigs
    • tree construction is simplified and replaced for-loops with dictionary updates.
  • Speed up initialise_metadata. Dandelion should now initialise and read faster.
    • Removed an unnecessary data sanitization step when loading data.
    • Now load_data will rename umi_count to duplicate_count
    • Speed up Query
      • tree construction is simplified and replaced for-loops with dictionary updates.
      • didn't need to use an airr validator as that slows things down.
  • data initialised by Dandelion will be ordered based on productive first, then followed by umi count (largest to smallest).

Breaking Changes

  • initialise_metadata/update_metadata/Dandelion
    • For-loops to initialise the object has veen vectorized, resulting in a minor speed uprade
    • This results in reduction of some columns in the .metadata which were probably bloated and not used.
      • vdj_status and vdj_status_summary removed and replaced with rearrangement_VDJ_status and rearrange_VJ_status
      • constant_status and constant_summary removed and replaced with constant_VDJ_status and constant_VJ_status.
      • productive and productive_summary combined and replaced with productive_status.
      • locus_status and locus_status_summary combined and replaced with locus_status.
      • isotype_summary replaced with isotype_status.
  • where there was previously unassigned or '' has been changed to :str: None in .metadata.
    • Not changed to NoneType as there's quite a bit of text processing internally that gets messed up if swapped.
    • No_contig will still be populated after transfer to AnnData to reflect cells with no TCR/BCR info.
  • deprecate use of nxviz<0.7.4

Minor changes

  • Rename and deprecate read_h5/write_h5. Use of read_h5ddl/write_h5ddl will be enforced in the next update.

Full Changelog: v0.2.1...v0.2.2

v0.2.2

27 Jun 11:57
801f5bd
Compare
Choose a tag to compare

What's Changed

Bug fixes and Improvements

  • Speed up generate_network
    • pair-wise hamming distance is calculated on per clone/clonotype only if more than 1 cell is assigned to a clone/clonotype
    • .distance slot is removed and is now directly stored/converted from the .graph slot.
    • new options:
      • compute_layout: bool = True. If dataset is too large, generate_layout can be switched to False in which case only the networkx graph is returned. The data can still be visualised later with scirpy's plotting method (see below).
      • layout_method: Literal['sfdp', 'mod_fr'] = 'sfdp'. New default uses the ultra-fast C++ implemented sfdp_layout algorithm in graph-tools to generate final layout. sfdp stands for Scalable Force Directed Placement.
        • Minor caveat is that the repulsion is not as good - when there's a lot of singleton nodes, they don't separate well unless you some how work out which of the parameters in sfdp_layout to tweak will produce an effective separate. changing gamma alone doesn't really seem to do much.
        • The original layout can still be generated by specifying layout_method = 'mod_fr'. Requires a separate installation of graph-tool via conda (not managed by pip) as it has several C++ dependencies.
        • pytest on macos may also stall because of a different backend being called - this is solved by changing tests that calls generate_network to run last.
    • added steps to reduce memory hogging.
    • min_size was doing the opposite previously and this is now fixed. #155
  • Speed up transfer
    • Found a faster way to create the connectivity matrix.
    • this also now transfer a dictionary that scirpy can use to generate the plots scverse/scirpy#286
    • Fix #153
      • rename productive to productive_status.
  • Fix #154
    • reorder the if-else statements.
  • Speed up filter_contigs
    • tree construction is simplified and replaced for-loops with dictionary updates.
  • Speed up initialise_metadata. Dandelion should now initialise and read faster.
    • Removed an unnecessary data sanitization step when loading data.
    • Now load_data will rename umi_count to duplicate_count
    • Speed up Query
      • tree construction is simplified and replaced for-loops with dictionary updates.
      • didn't need to use an airr validator as that slows things down.
  • data initialised by Dandelion will be ordered based on productive first, then followed by umi count (largest to smallest).

Breaking Changes

  • initialise_metadata/update_metadata/Dandelion
    • For-loops to initialise the object has veen vectorized, resulting in a minor speed uprade
    • This results in reduction of some columns in the .metadata which were probably bloated and not used.
      • vdj_status and vdj_status_summary removed and replaced with rearrangement_VDJ_status and rearrange_VJ_status
      • constant_status and constant_summary removed and replaced with constant_VDJ_status and constant_VJ_status.
      • productive and productive_summary combined and replaced with productive_status.
      • locus_status and locus_status_summary combined and replaced with locus_status.
      • isotype_summary replaced with isotype_status.
  • where there was previously unassigned or '' has been changed to :str: None in .metadata.
    • Not changed to NoneType as there's quite a bit of text processing internally that gets messed up if swapped.
    • No_contig will still be populated after transfer to AnnData to reflect cells with no TCR/BCR info.
  • deprecate use of nxviz<0.7.4

Minor changes

  • Rename and deprecate read_h5/write_h5. Use of read_h5ddl/write_h5ddl will be enforced in the next update.

Full Changelog: v0.2.1...v0.2.2

v0.2.1

19 May 16:50
85d4fa0
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.2.0...v0.2.1