Tutorial for high dimensional cytometry data #17

luglilab · 2022-05-25T10:16:25Z

Hi,

I'd like to use pyVIA with flow cytometry data, could you add a tutorial for this kind of data?

Do you have some suggestion about parameters to set?

Thanks you in advance.

Simone

ShobiStassen · 2022-05-30T00:43:24Z

hi Simone,

Thanks for your message.
Have you seen this tutorial for a mass cytometry dataset, this would help you I think: https://pyvia.readthedocs.io/en/latest/mESC_timeseries.html. This tutorial uses the time-series information in addition to the surface marker expression, but you can just ignore the time-series input labels.

Can I ask what the dimensions of your data are (before PCA etc), (n_cells x n_markers)?
Depending on the dimensionality you may or may not opt for PCA (of e.g. top 30 pcs) before running Via. If you have a fairly concise set of meaningful proteins/surface markers then you might be better of avoiding PCA.
Typically knn of around 20-30 is good for most datasets unless you have very low cell count. If you have a look at the tutorials for other types of data, you can probably use them as a starting point for parameters and then tune depending on the outcome.
I have been meaning to make a Parameter Tuning Tutorial too, it's on my ToDo :)
The parameters which have most impact are

Number of K Nearest Neighbors, number of PCs (if you do PCA)
jac_std_global (somewhere between 0.15 and 2, with lower meaning more smaller clusters
cluster_graph_pruning_std (also between 0.15 and 2, with smaller numbers meaning fewer edges retained in the cluster graph)
too_big_factor (between 0.1 and 0.3) where smaller numbers break up large clusters to offer more granularity.
To make Streamplots (no RNA velocity needed for this),
https://pyvia.readthedocs.io/en/latest/mESC_timeseries.html
https://pyvia.readthedocs.io/en/latest/ViaJupyter_Pancreas_RNAvelocity.html

ShobiStassen · 2022-06-02T02:44:05Z

@luglilab Just wanted to ask if you were able to use the Readthedocs tutorial?
Cheers,
Shobi

sinnamone · 2022-06-03T09:42:15Z

Dear @ShobiStassen ,

I'm taking a look to your tutorial linked in the above message ignoring the time series.

Before the PCA the dimensions of matrix is usually [row from 10.000 to 1 milion] X [columns < 30 ].

As you suggest I switch from PARC to pyVIA and here https://github.com/luglilab/Cytophenograph/blob/master/PhenoFunctions_v5.py if you could take a look the method "runvia" where I put the executions and the parameters. KNN and Resolution should set by user while others are fixed.

Now I'm doing some test with different dataset of high dimensional cytometry (small - medium - big) to understand if the tuning of parameters could improve the results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tutorial for high dimensional cytometry data #17

Tutorial for high dimensional cytometry data #17

luglilab commented May 25, 2022

ShobiStassen commented May 30, 2022

ShobiStassen commented Jun 2, 2022

sinnamone commented Jun 3, 2022

Tutorial for high dimensional cytometry data #17

Tutorial for high dimensional cytometry data #17

Comments

luglilab commented May 25, 2022

ShobiStassen commented May 30, 2022

ShobiStassen commented Jun 2, 2022

sinnamone commented Jun 3, 2022