Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about PairedSCGLUEModel #87

Open
alitinet opened this issue Jul 27, 2023 · 6 comments
Open

Question about PairedSCGLUEModel #87

alitinet opened this issue Jul 27, 2023 · 6 comments

Comments

@alitinet
Copy link

Hi,

does PairedSCGLUEModel work in a way that it finds common .obs_names among all present modalities? Or is it also possible to use the paired model when e.g. integrating 3 modalities and where there are common cells only between two out of the three modalities? So the set up would be the following: we are trying to integrate a CITE-seq dataset (same cells for RNA and ADT modalities) and a CYTOF dataset (where cells are different from the CITE-seq dataset). Would the model be able to pair RNA and ADT cells? Thanks!

@Jeff1995
Copy link
Collaborator

Jeff1995 commented Jul 28, 2023

Hi @alitinet! Thanks for your interest in GLUE! The short answer is yes. The PairedSCGLUEModel works when there are only common cells between two out of three modalities.

It doesn't just extract common .obs_names. What it does is it takes the unique value of .obs_names from all modalities, and pairs cells with the same .obs_names no matter how many modalities the cell covers, through a matrix that we call the "pairing mask" (pmsk in the code). E.g., say we have an RNA modality with cells [A, B, C], an ADT modality with cells [B, C, D] and an ATAC modality with cells [C, D, E]. The pmsk looks like below:

RNA ADT ATAC
A 1 0 0
B 1 1 0
C 1 1 1
D 0 1 1
E 0 0 1

The pairing loss is computed based on this pmsk so it can accommodate any pairing pattern, including the setting you mentioned.

Let me know if there are any further problems!

@alitinet
Copy link
Author

Hi @Jeff1995,

Thanks so much for the quick reply! This is great, then a follow-up question: when using PairedSCGLUEModel , the model still outputs an embedding per cell per modality, right? So the pairing is only used to calculate the pairing loss? Or is there a way to obtain only one embedding per cell, i.e. in your example above to obtain 5 embeddings (1 per cell), and not 3+3+3=9 embeddings?

@Jeff1995
Copy link
Collaborator

I'm afraid that's not currently supported. The model always returns all 9 embeddings. In this case I'd suggest taking the mean of paired cell embeddings.

I'll see if I can add an additional function to compute this, but for now you would have to compute this mean manually.

@alitinet
Copy link
Author

Got it, thanks for your prompt replies!

@Jeff1995
Copy link
Collaborator

Great! I'll let you know when that function becomes available :)

@HelloWorldLTY
Copy link

Hi, I am also interested in integrating cite-seq (like paried 10X multiome dataset). My idea is to modify the gudiance graph based on gene-protein encoding relation and self-loops. Moreover, I model protein data based on normal distribution rather than NB. I have implemented one version and I wonder if I can open a pull-up path to upload my codes. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants