More hierarchical API #1739

gokceneraslan · 2021-03-11T16:51:16Z

Hi all,

Right now we have two layers in the scanpy API. The top layer consists of the major modules like pp,pl,tl as well as the smaller ones like queries,get,datasets. In addition, we have some useful functions directly under the scanpy package like read/read_text/read_mtx etc. It is obvious that the field is advancing and alternative/better ways to perform fundamental tasks in downstream analysis (e.g. normalization, DE tests, gene selection) are emerging and will continue to emerge. Consequently, this necessitates an expansion of the scanpy API. However, I argue that having flat top-level modules makes it difficult to extend scanpy, while maintaining a reasonable API.

Right now there are two ways to introduce new functionality (assuming that it's not something completely unrelated)

add a new flavor/method to an existing function (e.g. sc.pp.highly_variable_genes, sc.tl.rank_genes_groups) or
add a new function with a shared prefix e.g. sc.pp.neighbors_tsne (see Switch t-SNE implementation to openTSNE #1561) or sc.pp.normalize_pearson_residuals (see add normalization method to scanpy? berenslab/umi-normalization#1) or sc.pp.normalize_pearson_residuals_pca() (see Normalization and gene selection by analytical Pearson residuals #1715 ).

Since option 1 is more complicated in terms of managing the arguments (esp. method-specific ones), I believe we tend to switch to option 2 now. But given that we already have many functions with common prefixes and that shifting towards option 2 will likely introduce more functions with long underscored names, top layers will get even flatter and wider. Therefore, I think it's time to consider a third option which is to add another layer which makes the API a tiny bit more hierarchical.

Some examples I can think of are:

sc.read.{adata,csv,text,mtx,excel,loom,h5_10x,mtx_10x,...}
sc.pp.neighbors.{umap,gauss,rapids,tsne}
sc.pp.hvg.{seurat,seurat_v3,dispersion}
sc.pp.norm.{tpm,pearson}
sc.pp.filter.{genes,cells,rank_genes,...}
sc.tl.rank_genes.{logreg,wilcoxon,ttest}
sc.tl.cluster.{leiden,louvain}
sc.tl.score.{genes,cell_cycle}
sc.pl.rank_genes.{dotplot,matrixplot,...}
sc.pl.groups.{dot,matrix,violin,...}
sc.pl.embed.{umap,tsne,pca,...}

There are a few issues I can think of

I can imagine some resistance from some developers due to losing a few milliseconds by typing more characters 😄 but if you imagine the long term effects of option 2, I think this might save you some time 😛
What happens to the functions that do not fit in this scheme like sc.pp.combat, sc.tl.ingest/dpt/paga/etc, sc.pl.* (maybe plotting functions with groupby argument can be under sc.pl.groups.*) ? I am not entirely sure, one option is to keep them as is, and another is to make "singular" modules for them so that everything is placed in a third layer.
It will be harder to specify the "default" (i.e. somewhat recommended) method with this scheme. What I mean by that is that when we add a new flavor/method to an existing function, we can still have a default method (e.g. highly_variable_genes(flavor='seurat')) which makes things easier for the new users but here there is no obvious solution to that.

What do you think?

The text was updated successfully, but these errors were encountered:

LuckyMD · 2021-03-11T18:21:05Z

This sounds interesting, and definitely makes things more clean in the long run... but a big issue I think would be backward compatibility for everything that relies on Scanpy. Also, I wonder if this makes it a bit more difficult for new users as they would need to know what steps are required in a single-cell analysis pipeline to understand the organization.

gokceneraslan added the Question label Mar 11, 2021

pavlin-policar mentioned this issue Mar 18, 2021

Switch t-SNE implementation to openTSNE #1561

Draft

3 tasks

gokceneraslan changed the title ~~More hierarchical API~~ A new API Mar 19, 2021

giovp mentioned this issue Jul 5, 2021

Normalization and gene selection by analytical Pearson residuals #1715

Merged

flying-sheep changed the title ~~A new API~~ More hierarchical API May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More hierarchical API #1739

More hierarchical API #1739

gokceneraslan commented Mar 11, 2021 •

edited

LuckyMD commented Mar 11, 2021

More hierarchical API #1739

More hierarchical API #1739

Comments

gokceneraslan commented Mar 11, 2021 • edited

LuckyMD commented Mar 11, 2021

gokceneraslan commented Mar 11, 2021 •

edited