Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More hierarchical API #1739

Open
gokceneraslan opened this issue Mar 11, 2021 · 1 comment
Open

More hierarchical API #1739

gokceneraslan opened this issue Mar 11, 2021 · 1 comment
Labels

Comments

@gokceneraslan
Copy link
Collaborator

gokceneraslan commented Mar 11, 2021

Hi all,

Right now we have two layers in the scanpy API. The top layer consists of the major modules like pp,pl,tl as well as the smaller ones like queries,get,datasets. In addition, we have some useful functions directly under the scanpy package like read/read_text/read_mtx etc. It is obvious that the field is advancing and alternative/better ways to perform fundamental tasks in downstream analysis (e.g. normalization, DE tests, gene selection) are emerging and will continue to emerge. Consequently, this necessitates an expansion of the scanpy API. However, I argue that having flat top-level modules makes it difficult to extend scanpy, while maintaining a reasonable API.

Right now there are two ways to introduce new functionality (assuming that it's not something completely unrelated)

  1. add a new flavor/method to an existing function (e.g. sc.pp.highly_variable_genes, sc.tl.rank_genes_groups) or

  2. add a new function with a shared prefix e.g. sc.pp.neighbors_tsne (see Switch t-SNE implementation to openTSNE #1561) or sc.pp.normalize_pearson_residuals (see add normalization method to scanpy? berenslab/umi-normalization#1) or sc.pp.normalize_pearson_residuals_pca() (see Normalization and gene selection by analytical Pearson residuals  #1715 ).

Since option 1 is more complicated in terms of managing the arguments (esp. method-specific ones), I believe we tend to switch to option 2 now. But given that we already have many functions with common prefixes and that shifting towards option 2 will likely introduce more functions with long underscored names, top layers will get even flatter and wider. Therefore, I think it's time to consider a third option which is to add another layer which makes the API a tiny bit more hierarchical.

Some examples I can think of are:

sc.read.{adata,csv,text,mtx,excel,loom,h5_10x,mtx_10x,...}
sc.pp.neighbors.{umap,gauss,rapids,tsne}
sc.pp.hvg.{seurat,seurat_v3,dispersion}
sc.pp.norm.{tpm,pearson}
sc.pp.filter.{genes,cells,rank_genes,...}
sc.tl.rank_genes.{logreg,wilcoxon,ttest}
sc.tl.cluster.{leiden,louvain}
sc.tl.score.{genes,cell_cycle}
sc.pl.rank_genes.{dotplot,matrixplot,...}
sc.pl.groups.{dot,matrix,violin,...}
sc.pl.embed.{umap,tsne,pca,...}

There are a few issues I can think of

  1. I can imagine some resistance from some developers due to losing a few milliseconds by typing more characters 😄 but if you imagine the long term effects of option 2, I think this might save you some time 😛

  2. What happens to the functions that do not fit in this scheme like sc.pp.combat, sc.tl.ingest/dpt/paga/etc, sc.pl.* (maybe plotting functions with groupby argument can be under sc.pl.groups.*) ? I am not entirely sure, one option is to keep them as is, and another is to make "singular" modules for them so that everything is placed in a third layer.

  3. It will be harder to specify the "default" (i.e. somewhat recommended) method with this scheme. What I mean by that is that when we add a new flavor/method to an existing function, we can still have a default method (e.g. highly_variable_genes(flavor='seurat')) which makes things easier for the new users but here there is no obvious solution to that.

What do you think?

@LuckyMD
Copy link
Contributor

LuckyMD commented Mar 11, 2021

This sounds interesting, and definitely makes things more clean in the long run... but a big issue I think would be backward compatibility for everything that relies on Scanpy. Also, I wonder if this makes it a bit more difficult for new users as they would need to know what steps are required in a single-cell analysis pipeline to understand the organization.

@gokceneraslan gokceneraslan changed the title More hierarchical API A new API Mar 19, 2021
@flying-sheep flying-sheep changed the title A new API More hierarchical API May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants