Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Considerations for single cell splicing data #51

Open
dbogdano opened this issue Feb 15, 2023 · 2 comments
Open

Considerations for single cell splicing data #51

dbogdano opened this issue Feb 15, 2023 · 2 comments

Comments

@dbogdano
Copy link

Hello. Thank you for creating and maintaining this easy to use tool.

When scoring cells, have you considered how single-cell splicing data, stored as a cell-by-intron matrix of percent-spliced/PSI values, might be input? This data is generally more sparse than gene expression, with many values represented as NaN, where no underlying gene expression in the cell can be used to calculate PSI.

As is, score_cell returns NaN values as scores for every cell, likely due to the missing values in the input.

@martinjzhang
Copy link
Owner

Hi,

Thank you for the question. Does it make sense to replace the NaN values with 0 in the input data? Or this requires new method development?

Best,
Martin

@dbogdano
Copy link
Author

Hi Martin,

Thanks for the quick response. Unfortunately it isn't that simple, a 0 PSI value refers to a 0 rate of intron inclusion given the RNA-seq reads that either span a given splice junction, suggesting the intron is spliced out, or bypass the junction, suggesting it is retained. PSI values span 0 to 1, with 1 being 100% intron inclusion given the evidence. The NaN values refer to a lack of either type RNA-seq read in the single cell, providing no evidence for inclusion of excision.
Using PSI instead of read pileups overlapping splice junctions allows splicing to be represented without different levels of expression of the underlying gene confounding the measurement.

For now, I'm thinking of just using pseudo-bulked cells representing the mean PSI values of somewhere between 10-100 single cells, grouped together by similar gene expression, before trying anything more sophisticated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants