Releases · scverse/scirpy

This update introduces a new datastructure based on awkward arrays.
The new datastructure is described in more detail in the documentation and is considered the "official" way of representing AIRR data for scverse core and ecosystem packages.

Benefits of the new data structure include:

a more natural, lossless representation of AIRR Rearrangement data
separation of AIRR data and the receptor model, thereby getting rid of previous limitations (e.g. "only productive chains") and enabling other use-cases (e.g. spatial AIRR data) in the future.
clean adata.obs as AIRR data is not expanded into columns
support for MuData for working with paired gene expression and AIRR data as separate modalities.

The overall workflow stays the same, however this update required several backwards-incompatible changes which are summarized below.

Backwards-incompatible changes

New data structure

Closes issue #327.

Changed behavior:

there are no "has_ir" and "multichain" columns in adata.obs anymore
By default all fields are imported from AIRR rearrangement and 10x data.
The restriction that all chains added to an AirrCell must have the same fields has been removed. Missing fields are automatically filled with missing values.
io.upgrade_schema can update from v0.7 to v0.13 schema. AnnData objects generated with scirpy <= 0.6.x cannot be read anymore.
pl.spectratype now has a chain attributed and the meaning of the cdr3_col attribute has changed.

New functions:

pp.index_chains
pp.merge_chains

Removed functions:

pp.merge_with_ir
pp.merge_airr_chains

API supporting MuData

Closes issue #383

All functions take (where applicable) the additional, optional keyword arguments

airr_mod: the modality in MuData that contains AIRR information (default: "airr")
airr_key: the slot in adata.obsm that contains AIRR rearrangement data (default: "airr")
chain_idx_key: the slot in adata.obsm that contains indices specifying which chains in adata.obsm[airr_key] are the primary/secondary chains etc.

New class:

util.DataHandler

Updated example datasets

The example datasets have been updated to be based on the new datastructure and are now based on MuData.

The example datasets have been regenerated from scratch using the loader notebooks described in the docstring. The Maynard dataset gene expression is now based on values generated with Salmon instead of RSEM/featurecounts.
Scirpy now uses pooch to manage example datasets.

Cleanup

Removed the deprecated functions io.from_tcr_objs, io.from_ir_objs, io.to_ir_objs, pp.merge_with_tcr, pp.tcr_neighbors, pp.ir_neighbors, tl.chain_pairing
Removed the deprecated classes TcrCell, AirrChain, TcrChain
Removed the function pl.cdr_convergence which was never public anyway.

Additions

Easy-access functions (`scirpy.get`)

Closes issue #184

New functions:

get.airr
get.obs_context
get.airr_context

Fixes

Several type hints that were previously inaccurate are now updated.
Fix x-axis labelling in pl.clonotype_overlap raises an error if row annotations are not unique for each group.

Documentation

The documentation has been updated to reflect the changes described above, in particular the tutorials and the page about the data structure.

Other changes

The minimum required Python version is now 3.8 (#381)
Increased the minium version of tqdm to 4.63 (See tqdm/tqdm#1082)
pl.repertoire_overlap now always runs tl.repertoire_overlap internally and doesn't rely on cached values.
The mode dendro_only in pl.repertoire_overlap has been removed.
Cells that have a receptor, but no CDR3 sequence have previously received a separate clonotype in tl.define_clonotypes. Now they are receiving no clonotype (i.e. np.nan) as do cells without a receptor.
The function tl.clonal_expansion now returns a pd.Series instead of a np.array with inplace=False
Removed deprecation for clonotype_imbalanced, see #330
The group_abundance tool and plotting function used has_ir as a default group as we could previously rely on this column being present. With the new datastructure, this is not the case. To no break old code, the has_ir column is tempoarily added when requested. The group_abundance function will have to be rewritten enitrely in the future, see #232
In pl.spectratype, the parameter groupby has been replaced by chain.
We now use isort to organize imports.
Static typing has been improved internally (using pylance). It's not perfectly consistent yet, but we will keep working on this in the future.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backwards-incompatible changes

New data structure

API supporting MuData

Updated example datasets

Cleanup

Additions

Easy-access functions (`scirpy.get`)

Fixes

Documentation

Other changes

Fixes

Other Changes

Contributors

Releases: scverse/scirpy

v0.17.0

v0.16.1

v0.16.0

v0.15.0

v0.14.0

v0.13.1

v0.13.0 - new data structure based on awkward arrays

v0.12.2

v0.13.0rc1 - new data structure based on awkward arrays

Backwards-incompatible changes

New data structure

API supporting MuData

Updated example datasets

Cleanup

Additions

Easy-access functions (scirpy.get)

Fixes

Documentation

Other changes

v0.12.1

Fixes

Other Changes

Contributors

Easy-access functions (`scirpy.get`)