Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tutorial for high dimensional cytometry data #17

Open
luglilab opened this issue May 25, 2022 · 3 comments
Open

Tutorial for high dimensional cytometry data #17

luglilab opened this issue May 25, 2022 · 3 comments

Comments

@luglilab
Copy link

Hi,

I'd like to use pyVIA with flow cytometry data, could you add a tutorial for this kind of data?

Do you have some suggestion about parameters to set?

Thanks you in advance.

Simone

@ShobiStassen
Copy link
Owner

hi Simone,

Thanks for your message.
Have you seen this tutorial for a mass cytometry dataset, this would help you I think: https://pyvia.readthedocs.io/en/latest/mESC_timeseries.html. This tutorial uses the time-series information in addition to the surface marker expression, but you can just ignore the time-series input labels.

Can I ask what the dimensions of your data are (before PCA etc), (n_cells x n_markers)?
Depending on the dimensionality you may or may not opt for PCA (of e.g. top 30 pcs) before running Via. If you have a fairly concise set of meaningful proteins/surface markers then you might be better of avoiding PCA.
Typically knn of around 20-30 is good for most datasets unless you have very low cell count. If you have a look at the tutorials for other types of data, you can probably use them as a starting point for parameters and then tune depending on the outcome.
I have been meaning to make a Parameter Tuning Tutorial too, it's on my ToDo :)
The parameters which have most impact are

@ShobiStassen
Copy link
Owner

@luglilab Just wanted to ask if you were able to use the Readthedocs tutorial?
Cheers,
Shobi

@sinnamone
Copy link

Dear @ShobiStassen ,

I'm taking a look to your tutorial linked in the above message ignoring the time series.

Before the PCA the dimensions of matrix is usually [row from 10.000 to 1 milion] X [columns < 30 ].

As you suggest I switch from PARC to pyVIA and here https://github.com/luglilab/Cytophenograph/blob/master/PhenoFunctions_v5.py if you could take a look the method "runvia" where I put the executions and the parameters. KNN and Resolution should set by user while others are fixed.

Now I'm doing some test with different dataset of high dimensional cytometry (small - medium - big) to understand if the tuning of parameters could improve the results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants