Cluster labels when using a hard-coded prior #26

carter-allen · 2023-04-15T04:47:00Z

In the vignette section Using Priors, the pfit2 object is supposed to have $K = 2$ mixture components, but when you check table(pfit2@label) you find that all observations are assigned to one cluster. However, according to the scatterplot of pfit2, there are 2 components. Is there are reason for the discrepancy?

The text was updated successfully, but these errors were encountered:

gfinak · 2023-04-15T13:54:42Z

You ran the example locally and got a different result? The question is: are there reasons for this?
Yes, there are.
flowclust is not optimally maintained. I don't have time to devote to it like I have in the past. The package has seen three different authors and maintainers in its life so far. And the prior code found little use in practice. It is in the end, research code. Lots of it should probably be rewritten in a more modern style.
The scope of use cases where I would trust the package to do work is for identifying populations in fsc /ssc space + a few other markers. That's been most used and best maintained.
Some day I'll get to rewriting it.

carter-allen · 2023-04-15T15:01:25Z

Hi, thanks for the response! It is actually not a discrepancy between the vignette and the results I get locally. I am able to re-produce the vignette results exactly. However, when I check table(pfit2@label) after the final line of the vignette, I find that all observations are assigned to a single mixture component, despite plot(pfit2, data = rituximab2) displaying two mixture components.

I've found the package to work quite well for the use cases you mentioned, however I'd like to try to incorporate prior information. Would you recommend against using any non-default prior at this time?

Thanks in advance!

gfinak · 2023-04-15T15:49:45Z

I see. That sounds like a bug. It might be simple to resolve but it might not. I don't have a bioc dev environment available to me and I wouldn't be able to get to investigating it for some time.
The flowclust fit object also has a slot that holds the probability of each cell belonging to each component. The rowwise argmax of that can give you cell level assignments but it wouldn't account for outliers, like the label slot is supposed to I believe.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster labels when using a hard-coded prior #26

Cluster labels when using a hard-coded prior #26

carter-allen commented Apr 15, 2023

gfinak commented Apr 15, 2023

carter-allen commented Apr 15, 2023

gfinak commented Apr 15, 2023

Cluster labels when using a hard-coded prior #26

Cluster labels when using a hard-coded prior #26

Comments

carter-allen commented Apr 15, 2023

gfinak commented Apr 15, 2023

carter-allen commented Apr 15, 2023

gfinak commented Apr 15, 2023