Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to pull out the stability index matrix from the SC3 object? #60

Open
yingzhang121 opened this issue Jan 29, 2018 · 7 comments

Comments

@yingzhang121
Copy link

Hi,

I am wondering whether there is a way for me to run function calculate_stability to get the stability index matrix?

I tried this function in SC3_1.7.6 and SC3_1.3.18, neither worked.

Best,
Ying

@pati-ni
Copy link
Contributor

pati-ni commented Feb 6, 2018

Hi Ying,

to plot clustering stability you have to invoke sc3_plot_cluster_stability(sce, k = 3).
That said, it is a prerequisite that you have run the sc3 for that k.
For example if you run sc3(sce, ks = 2:3) then sc3_plot_cluster_stability would yield an error if your k is outside of the [2,3] range.

@yingzhang121
Copy link
Author

Hi, Pati,

I think you mis-understood my question. I have already run the full SC3 workflow, and I do have my object.

So rather than simply plotting the stability index, I want to extract the same information and save to a data frame. It seems the function calculate_stability should do the work, but it doesn't even allow me to run it directly.

That is why I am asking whether there is a way to extract this information.

Best,
Ying

@pati-ni
Copy link
Contributor

pati-ni commented Feb 6, 2018

@yingzhang121 Currently that functionality is not exposed to the user space. But it is possible as an enhancement for future iterations.

@yingzhang121
Copy link
Author

yingzhang121 commented Feb 8, 2018

@pati-ni It will be great to include this function in future release. Thank you for taking my input.

I also have a related question that might be off topic, however, I just look for insight from the developers. My question is how we should interpret the stability index? For example, I know the larger the index, the cluster is more stable, but does a stability index of 0.2 useless? In my project, I always got a K estimate around 30, and except 1-2, the rest of clusters usually have an index around 0.1-0.2. If we look into more details, for the high-indexed cluster, it usually contains less cells. So what does this mean? Should we pay less attention to the low-indexed clusters that include the majority of cells? I guess the basic question is what is an expected stability index for a specific K. I can image that with a few thousand cells, the probability of getting a specific clustering result is usually low, maybe as low as 0.0000000001. Then no matter how low is the stability index, the cluster should be significant (or stable) in some sense. But if this is true, then what is the purpose of this iterative permutation on different clustering algorithms and the design of the stability index?

So I would like to ask you to provide a baseline of the stability index, like a line of 0.1. Then we might draw conclusions like if the index is below the baseline, the cluster is a result of some random effect. Otherwise, it is a true statistical significant result.

Thank you for spending your time reading my post.

@wikiselev
Copy link
Member

wikiselev commented Feb 8, 2018

@yingzhang121 thanks for your question! Please note, that k estimation is not the true k, it is just an estimation, and we also noticed that it overestimates k for UMI-based dataset (where the sparsity of the matrix is much higher than in full length transcript protocols). Regarding your stability question - please note that stability is relative to the range of ks you've run clustering for. So, if your range of ks is small you might get one value, then if you add more ks to your calculations you will get a different value. Again, stability index is not the ultimate truth, it's more of a guidance for yourself. Its value decreases in two cases: 1. If cells are removed from your cluster when you change k; and 2. If your cluster splits into multiple clusters when you increase k. Hope this helps.

@yingzhang121
Copy link
Author

@wikiselev Thank you for the detailed reply. So for my SC3 workflow, I started with k estimation, then set up SC3 workflow with a series of K's surrounding the estimated K value. I thought this was a better way to check for the "real" clustering result. However, I might be wrong given your explanation. And, yes, I did notice that SC3 reported a higher number of clusters (k estimate) than other methods I used, such as CIDR, Seurat, SCDE etc.
However, I do believe generating a clustering consensus is the way to go, so I also tried another package "clusterExperiment". Then I found I could use the stability index from K to guide the combineMany function. In simple, combineMany from clusterExperiment requires an input of "proportion" (a value for how frequently two samples were grouped into one cluster). For the same dataset, when my SC3 (k=35) and majority of stability index is below 0.3, I can use 0.3 in combineMany function and got 14 clusters. I also tried to use the up-limit of the stability index (0.7 in this case) in clusterMany, and I got 84 clusters. Intuitively, if the cluster is more stable, then the two samples are more frequently grouped together, and more stringent threshold should result in more clusters. I know this looks like weird, but I plan to compare all the clustering consensus with my Seurat results, and hope we could identify the same group of cells again and again.

@wikiselev
Copy link
Member

@yingzhang121 yes, it's a pretty complicated analysis, but I hope you will get good results!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants