Skip to content

Releases: zktuong/dandelion

v0.1.4

19 Jul 10:46
Compare
Choose a tag to compare
  • Multiple bug fixes.

  • Reworked filter_contigs

    • Added a 'lite' mode for filter_contigs where it just checks for v/j/c gene call mis-match. Toggled with simple = True.
    • Split the filtering between and productive and non-productive contigs:
      • Replaced rescue_igh option with keep_highest_umi option which applies to all locus.
  • Updated the isotype dictionary to allow for mouse genes which seems to address #70.

  • Added more tests.

  • Updated ddl.pp.calculate_threshold to reflect the updated shazam's disToNearest functionalities.

  • Added sanitization functions to check for data stored in dandelion is relatively compliant with airr-standards (barring missing columns from 10x's data or scirpy's transferred data.

  • Updated transfer of boolean columns to anndata to be stored as string rather than category during filter_contigs.

  • Updated h5py requirement to be >=3.1.0. 2.10.0 should still work though. This led to some updates to how AnnData was storing info from dandelion after filter_contigs. Should update the tutorial in the next version.

TODO before merging:

  • complete 10x output parser
  • address #54

Ongoing:

  • Slight issue with integration with scirpy scverse/scirpy#283, but should be solved. Will edit tests when the the pr is merged.

  • Need to create larger fixtures to get access to some steps within the functions.

  • Also need to test a mouse fixture. Maybe i should merge a large mouse fixture?

Minor updates to v0.1.3

17 Jun 20:26
5fa84ac
Compare
Choose a tag to compare

Added locus options to tools that uses Dandelion class.

Add coverage bit, which required moving of test folder within dandelion.

Bug fix for singularity container and script.

v0.1.3

17 Jun 12:07
14e604f
Compare
Choose a tag to compare

Updates

  • Adjusted names and functions to allow for TCR data in AIRR format to be handled. Instead of heavy/light, the naming convention will be using VDJ/VJ - consistent with how scirpy names the columns. Same goes for mentions of BCR is renamed to contig where appropriate.
  • Renamed function name from filter_bcr to filter_contigs.
  • Added option for filename_prefix to control the behaviour during the preprocessing step better.
  • Updates should work with recently merged PR to modify the container script.
  • filter_contigs have been partially reworked - no longer require a multiple core implementation as the new implemenation now runs faster without it.
  • umi_count is now treated as a back up to duplicate_count if there is modification of the duplicate_count due to filter_contigs.
  • Added locus option to filter_contigs so that it can work with the new implementation of the data class.
  • Singularity container now has additional options to trigger tr pre-processing mode. Kudos to Krzysztof #80.
  • Rewrote tests to stop downloading files everytime it needs to start a test.
    • This lets me write very specific tests! Will have to expand on this eventually.

Bug Fix

  • Remove extra filtering section in single-core implementation of filter_contig.
  • make the umi_count vs duplicate_count behaviour more consistent. Now duplicate_count is the default column.

Ongoing

  • Tests are failing for the preprocess script. Perhaps I should break it up to find out exactly what's wrong. Readthedocs is also complaining that Command killed due to excessive memory consumption.
  • Rewriting unit tests. Seems like they are working so far (minus some typos here and there)
  • Need to add in more detailed tests.
  • Need to add in tests to switch data with scirpy. Also need to write native 10x data parser to reduce reliance on scirpy.

Known issues

  • The current instructions to use singularity comes with a few issues if ~/.bash_rc is present:
    • If the ~/.bash_rc is present and within it comes with conda initialization code, then the default conda path will be appended to the front of the container's $PATH. This impacts on users who install igblast and blast via conda + also want to use the container. What will happen is that the container will try to use the blast outside the container first, which may then lead to issues where it cannot find the database files.
    • Specifying --no-home doesn't solve the issue completely as scanpy requires a writeable numba_cache_dir. Currently the container will hopefully try to create a $PWD/dandelion_cache folder if this is an issue but needs more testing.

Still needs work

#54 Check productive_only option in filter_bcr
#62 Streamline update_metadata
#63 Fix update_metadata to work with concat.
#64 Allow retrieve to work both ways
#68 Native implementation of function to count mutation
#69 Rescue contigs that fail germline reconstruction?
#70 Check for compatibilty with mouse data

v0.1.2

15 May 14:11
c3962de
Compare
Choose a tag to compare

A new name for v0.1.1.post1 to fix pypi issue.

v0.1.1.post1

15 May 10:46
84bb252
Compare
Choose a tag to compare

Bug fixes and updates:

  • versioning
    Changed versioning strategy to now use setuptools_scm to pull the version and predict next update version number.
    Note to self: From now on, any major updates just requires a new tag. For example:
    After the merge of the pull request, set a new tag and push
git tag -a v0.1.1
git push --tags
  • filter_bcr
    A minor edit, but decided to switching the fold-change cut off back to 2 to fit with the original filtering strategy.

  • setting up container
    Related to #51, to create a preprocessing wrapper, we will add in an option to export the plots generated during pre-processing.
    Simple addition of a boolean save_plot option to:

  • reassign_allele

  • reassign_alleles

  • assign_isotype

  • assign_isotypes

Still needs work

#51 Preprocessing wrapper for singularity container
#54 Check productive_only option in filter_bcr
#62 Streamline update_metadata
#63 Fix update_metadata to work with concat.
#64 Allow retrieve to work both ways
#68 Native implementation of function to count mutation
#69 Rescue contigs that fail germline reconstruction?
#70 Check for compatibilty with mouse data

New features:

  • scirpy interopertability
    Now fully works with scirpy's tool to transfer the data format.

  • singularity container
    Singularity container recipe/image created.

ls
# database  environment.yml  ncbi-blast-2.10.1+  ncbi-igblast-1.15.0  sc-dandelion.def
singularity build --fakeroot sc-dandelion.sif sc-dandelion.def
singularity sign sc-dandelion.sif
singularity verify sc-dandelion.sif
singularity push sc-dandelion.sif library://kt16/default/sc-dandelion:latest 

To download and use:

singularity pull library://kt16/default/sc-dandelion:latest
singularity shell sc-dandelion.sif


- **pre-processing plots saving**
Meant to be used primarily for the container, but now specifiying `save_plot` into `reassign_alleles` and `assign_isotypes` will save the plots from pre-processing accordingly.

## Depreciated:

I should start using depreciation decorators but only one major change: `reassign_alleles_` no longer used/available.

v0.1.0

30 Mar 18:20
d7360c2
Compare
Choose a tag to compare

Bug fixes and updates:

  • Github actions - tests
    Fixed issue with tests failing. Mostly resolved.

  • find_clones and generate_network
    Bugs were introduced with the v0.028 due to the rework of dandelion initialization causing some clones to be excessively splitted (multiple '|' separators), IgM/IgD catcher was skipping a few steps unnecessarily, and networks were not generating properly due to incorrect index referencing. This has now been corrected. Switched back to using squareform + pdist for calculating distances to simplify the code. May revisit in the future.

  • filter_bcr
    Similar issue as above, contigs with multiple IgH were not flagged properly and this resulted in extremely noisy data. This is now fixed and the IgM|IgD catcher is actually working properly now! Also added a productive_only toggle to only retain bcrs that are determined to be productive. Allow the user the flexibility to change this. Perhaps i should add the same toggle to all other functions so that the contigs can pass through and be flexibly used in subsequent steps?

  • update_metadata
    Found some typos.

  • quantify_mutation
    Fixed issue where original code wasn't allowing for the R-objects to be parsed properly when specifting non-NULL arguments.

  • miscellaneous
    updated codes to prevent syntax warnings.

  • annotation
    All main functions visible to user are annotated. Can be improved further for clarity but for next version update.

Still needs work

  • plotting issues
    There's an issue in reassign_alleles where sometimes plotting fail. I've added a try-except step to try and circumvene the error while i search for what it actually is.

  • metadata info
    I think a column is needed to deal with multi-chain flagging properly. It's currently not ideal as IgM/IgD cells can be flagged as multi and calling them single isn't the right solution.

New features:

  • Integration with scirpy
    Dandelion can read scirpy's processed output. Related to scverse/scirpy#240
    Will update when scirpy's AirrCell class is fully implemented.

Depreciated:

v0.0.26

29 Dec 13:45
f254c8e
Compare
Choose a tag to compare

Bug fixes and updates:

Prevent transfer from overwriting anndata.obs columns
This was causing issues if the column name already exists. Changed this to now respect the column in anndata.obs first.

Slight adjustment to calculation of gini indices
Added a single zero to the end of each sorted array if the array length is longer than 1 so that the Lorenz curve starts from 0. This effectively solves the problem where the gini indices were originally being returned as negative values. Also added some desecription into the relevant locations to describe why the tabulation for clone/node degree/centrality is different compared to cluster size.

Fixed downsampling and metric options
Some parts were ignoring the options because I forgot to update them in the main function within functions.

Fix rpy2 dependency to be <3.3.5 until I find out what's wrong
Recently updated my mac's R to version 4 and rpy2 >3.3.5 really didn't like it. Not sure if this would be be fixed in 3.4?

New features:

Vertex size gini calculation
Finally had a crack at trying to implement this method. This was proving to be a challenge as native implementation with networkx's node contractions tools was fine for sparse clones but really struggles in highly connected samples. The workaround is a simple counter but having access to the graph would be ideal... Anyway, I will have to think about this more more in the future~, and whether I should remake the clone size gini to use a similar implementation to reflect the network more~. Currently this involves the reconstruction of networks which is quite time consuming, especially if the sample is large.

cluster size gini calculation
This is an attempt to also perform the gini calculation after network contraction. However, it also revealed a problem in that if the sample is not deeply sampled, then the gini index will not be reflected appropriately. For now, an option is placed in to choose whether or not to use the contracted network.

Depreciated:

clone_diversity gini calculation for anndata
No longer possible as the function will now absolutely require the the network to be present first i.e. it requires a dandelion object.

v0.0.21

16 Nov 13:54
02ecd46
Compare
Choose a tag to compare

Many bug fixes.

Rehauled preprocessing steps to allow for more flexibility with barcode naming and reannotation strategy to be more in line with immcantation's recommendations.

New functions to plot clonal overlap as a circos plot which requires separate installation of nxviz. Didn't put nxviz in the requirements/setup as there's a few conflicts with matplotlib and others during installation.

v0.0.16

16 Oct 15:06
1f63e49
Compare
Choose a tag to compare

Many bug fixes.

Smoothed out the initial preprocessing functions to allow for more flexible input options.

Allows for package to be reticulated (somewhat).

v0.0.14

09 Sep 13:35
68d577c
Compare
Choose a tag to compare

Major bug fixes.

Updated to work with AnnData>=7.1.0

Sped up creation of networks, filtering of BCR by allowing for parallelization.

Changed dependency on igraph to networkx.

Changed default behaviour for filter_bcr to drop contigs instead of filtering barcodes for those that only contain poor quality BCR data.

Added diversity estimation tools.