Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I use BuildSNNGraph on scWGBS data? #82

Open
xiaonian92 opened this issue Dec 29, 2020 · 2 comments
Open

Can I use BuildSNNGraph on scWGBS data? #82

xiaonian92 opened this issue Dec 29, 2020 · 2 comments

Comments

@xiaonian92
Copy link

xiaonian92 commented Dec 29, 2020

Hello dear author,

I used this package for my scRNA-seq data analysis, now I wonder if I can use the function "BuildSNNGraph" on my DNA methylome data (data frame: genome_bin * cell_ID)? My purpose is to cluster cells based on their methylation level (numeric 0~1) using the same clustering algorithm of scRNA-seq. I want to use "BuildSNNGraph" and then "igraph::cluster_walktrap". Is this right?

I notice the description says: Build a shared or k-nearest-neighbors graph for cells based on their expression profiles; x: For the ANY method, a matrix-like object containing expression values for each gene (row) in each cell (column). These dimensions can be transposed if transposed=TRUE.

Looking forward to your reply, thanks!

@LTLA
Copy link
Collaborator

LTLA commented Dec 29, 2020

Sounds reasonable, though I understand that most analyses on methylation data are done on the M-values (i.e., the log-ratio of methylated to unmethylated counts) rather than the beta-values (which lie in [0, 1], as you have described). This has some friendlier mean-variance properties, see Figure 3 of the paper here. From the perspective of clustering, the use of M-values means that a change in methylation from 0.01 to 0.02 has a similar effect as a change from 0.1 to 0.2, whereas the use of beta-values would give the same weight to both 0.01-->0.02 and 0.5-->0.51... the latter is probably not what you want.

Regardless of what metric you decide to use, it's a good idea to (i) select highly variable features and (ii) use d to perform a PCA for you. This will speed things up and get rid of some noise at the same time. You may try using modelGeneVar() for the variance estimation, though I don't really know how it'll turn out; you might have to set parametric=FALSE (see ?fitTrendVar) to get a decent fit, as the default settings are optimized for count data.

@xiaonian92
Copy link
Author

Okay I see, thanks for your kindly reply, I'll try!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants