Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Liana on pseduo bulk-sc data data #96

Open
Marwansha opened this issue Mar 28, 2024 · 6 comments
Open

Liana on pseduo bulk-sc data data #96

Marwansha opened this issue Mar 28, 2024 · 6 comments
Assignees
Labels
help wanted Extra attention is needed

Comments

@Marwansha
Copy link

Marwansha commented Mar 28, 2024

Hey,
I want to know if it's best practice to always use single cell data on liana to compute the l-r ccc results, as I saw in the tutorial of differential expression, after computing differential expression on pseduobulk data, Liana was run on the adata object, and I want to ask if it's best practise and if it's ok to run it on the pseduobulk data?

Thanks
Marwan

@Marwansha Marwansha added bug Something isn't working help wanted Extra attention is needed labels Mar 28, 2024
@dbdimitrov
Copy link
Collaborator

dbdimitrov commented Mar 29, 2024

Hi @Marwansha,

I assume you are referring to the DE analysis vignette. From my knowledge, it is the current best practice to perform differential testing between single-cell samples at the pseudobulk level.

A couple of reference on the topic:

https://www.nature.com/articles/s41576-023-00586-w
https://www.nature.com/articles/s41467-021-21038-1

I hope this helps.

@dbdimitrov
Copy link
Collaborator

dbdimitrov commented Mar 29, 2024

Hi @Marwansha,

Sorry but not sure I exactly follow. Can you elaborate in what sense I run liana on pseudobulks?

One can also say that average expression per cluster is a "pseudobulk" (which is how the vast majority of CCC methods approach it).

In the DE tutorial, you can think of the li.mt.df_to_lr as a join of the DE stats with ligand-receptor prior knowledge. So, not necessarily running LIANA+ in the standard sense.

@Marwansha
Copy link
Author

Marwansha commented Mar 29, 2024

sorry if i wasn't clear again,

my question is about generation the ligand-receptor interactions df
for example simply here for a single cell data object :
li.mt.rank_aggregate(adata, groupby='celltype', expr_prop=0.1, verbose=True)

my question is if i run the run aggreagate on the pseudobulk anndata object rather than the single cell object?
li.mt.rank_aggregate(**_pdata_**, groupby='celltype', expr_prop=0.1, verbose=True)
the one generated by decoupler, which here liana will treat each individal_celltype_ as 1 observation , so if we have 10 individuals, for 1 celltype we will be having 10 observations per condition, while in single cell data we got the no of cells per cell type as the observation

pdata = dc.get_pseudobulk(
    adata,
    groups_col="celltype",
    layer='counts',
    mode='sum',
    min_cells=10,
    min_counts=10000
)
pdata

i ran liana on the pseudobulk aggregated anndata object and the results make sense more for my data by comparing with the results from the single cell object as its much less noisy but i was not sure if this was tested before or which one is the best practice

Thanks

@dbdimitrov
Copy link
Collaborator

dbdimitrov commented Apr 4, 2024

Hi @Marwansha,

Sorry for the delay, I was on away.

Hmm. This is a really interesting approach, though not standard.
If you have normalized (total + log1p) the summed counts, I see nothing wrong it with.

It only changes a bit the interpretation, since instead of comparing means across cells, you are comparing means across sample pseudobulks.

Just to share my intuition with this, think of CellPhoneDB. You get a mean between the averaged ligand and receptor expression per cluster (lr_mean), and you get a p-value where the averaging is done on permuted cell labels (cpdb_pvals). In your case, I believe the lr_mean ranking shouldn't change too much whether you do it on pseudobulks or at the single-cell level. However, the p-values should be quite different (since you are shuffling cell type pseudobulks per sample) and would likely be a bit more conservative.

In short, at a glance, I like it as an idea, and it can make sense depending on your data. You are also avoiding over-inflated permuted p-values due to pseudoreplication. :)

@dbdimitrov
Copy link
Collaborator

PS. A major motivation of mine when writing liana-py was to make it flexible, so I'm glad to see when it's used in ways beyond the tutorials.

@dbdimitrov dbdimitrov removed the bug Something isn't working label Apr 4, 2024
@Marwansha
Copy link
Author

Marwansha commented Apr 4, 2024

Thank you very much for your response. In fact, I am trying to benchmark and compare the different results that come from computing the CCC (cell-cell communication) on single-cell or pseudobulk objects. From my perspective, and from a ground truth point of view (considering some ligand-receptor interactions that exist in one group and not in the other, which I know from literature and previous work), it seems that using the pseudobulk data makes it cleaner and easier to discern.

Would you be interested in having a short meeting? Maybe I can show you my data (I can share it too), so I can get some insights from your point of view on which approach makes more sense.

Thanks
Marwan
my email incase : marwan.sharawy@pasteur.fr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants