Liana on pseduo bulk-sc data data #96

Marwansha · 2024-03-28T19:11:56Z

Hey,
I want to know if it's best practice to always use single cell data on liana to compute the l-r ccc results, as I saw in the tutorial of differential expression, after computing differential expression on pseduobulk data, Liana was run on the adata object, and I want to ask if it's best practise and if it's ok to run it on the pseduobulk data?

Thanks
Marwan

dbdimitrov · 2024-03-29T10:14:35Z

Hi @Marwansha,

I assume you are referring to the DE analysis vignette. From my knowledge, it is the current best practice to perform differential testing between single-cell samples at the pseudobulk level.

A couple of reference on the topic:

https://www.nature.com/articles/s41576-023-00586-w
https://www.nature.com/articles/s41467-021-21038-1

I hope this helps.

dbdimitrov · 2024-03-29T14:04:24Z

Hi @Marwansha,

Sorry but not sure I exactly follow. Can you elaborate in what sense I run liana on pseudobulks?

One can also say that average expression per cluster is a "pseudobulk" (which is how the vast majority of CCC methods approach it).

In the DE tutorial, you can think of the li.mt.df_to_lr as a join of the DE stats with ligand-receptor prior knowledge. So, not necessarily running LIANA+ in the standard sense.

Marwansha · 2024-03-29T14:38:03Z

sorry if i wasn't clear again,

my question is about generation the ligand-receptor interactions df
for example simply here for a single cell data object :
li.mt.rank_aggregate(adata, groupby='celltype', expr_prop=0.1, verbose=True)

my question is if i run the run aggreagate on the pseudobulk anndata object rather than the single cell object?
li.mt.rank_aggregate(**_pdata_**, groupby='celltype', expr_prop=0.1, verbose=True)
the one generated by decoupler, which here liana will treat each individal_celltype_ as 1 observation , so if we have 10 individuals, for 1 celltype we will be having 10 observations per condition, while in single cell data we got the no of cells per cell type as the observation

pdata = dc.get_pseudobulk(
    adata,
    groups_col="celltype",
    layer='counts',
    mode='sum',
    min_cells=10,
    min_counts=10000
)
pdata

i ran liana on the pseudobulk aggregated anndata object and the results make sense more for my data by comparing with the results from the single cell object as its much less noisy but i was not sure if this was tested before or which one is the best practice

Thanks

dbdimitrov · 2024-04-04T05:51:44Z

Hi @Marwansha,

Sorry for the delay, I was on away.

Hmm. This is a really interesting approach, though not standard.
If you have normalized (total + log1p) the summed counts, I see nothing wrong it with.

It only changes a bit the interpretation, since instead of comparing means across cells, you are comparing means across sample pseudobulks.

Just to share my intuition with this, think of CellPhoneDB. You get a mean between the averaged ligand and receptor expression per cluster (lr_mean), and you get a p-value where the averaging is done on permuted cell labels (cpdb_pvals). In your case, I believe the lr_mean ranking shouldn't change too much whether you do it on pseudobulks or at the single-cell level. However, the p-values should be quite different (since you are shuffling cell type pseudobulks per sample) and would likely be a bit more conservative.

In short, at a glance, I like it as an idea, and it can make sense depending on your data. You are also avoiding over-inflated permuted p-values due to pseudoreplication. :)

dbdimitrov · 2024-04-04T05:54:35Z

PS. A major motivation of mine when writing liana-py was to make it flexible, so I'm glad to see when it's used in ways beyond the tutorials.

Marwansha · 2024-04-04T09:23:37Z

Thank you very much for your response. In fact, I am trying to benchmark and compare the different results that come from computing the CCC (cell-cell communication) on single-cell or pseudobulk objects. From my perspective, and from a ground truth point of view (considering some ligand-receptor interactions that exist in one group and not in the other, which I know from literature and previous work), it seems that using the pseudobulk data makes it cleaner and easier to discern.

Would you be interested in having a short meeting? Maybe I can show you my data (I can share it too), so I can get some insights from your point of view on which approach makes more sense.

Thanks
Marwan
my email incase : marwan.sharawy@pasteur.fr

Marwansha added bug Something isn't working help wanted Extra attention is needed labels Mar 28, 2024

Marwansha assigned dbdimitrov Mar 28, 2024

dbdimitrov removed the bug Something isn't working label Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Liana on pseduo bulk-sc data data #96

Liana on pseduo bulk-sc data data #96

Marwansha commented Mar 28, 2024 •

edited

dbdimitrov commented Mar 29, 2024 •

edited

dbdimitrov commented Mar 29, 2024 •

edited

Marwansha commented Mar 29, 2024 •

edited

dbdimitrov commented Apr 4, 2024 •

edited

dbdimitrov commented Apr 4, 2024

Marwansha commented Apr 4, 2024 •

edited

Liana on pseduo bulk-sc data data #96

Liana on pseduo bulk-sc data data #96

Comments

Marwansha commented Mar 28, 2024 • edited

dbdimitrov commented Mar 29, 2024 • edited

dbdimitrov commented Mar 29, 2024 • edited

Marwansha commented Mar 29, 2024 • edited

dbdimitrov commented Apr 4, 2024 • edited

dbdimitrov commented Apr 4, 2024

Marwansha commented Apr 4, 2024 • edited

Marwansha commented Mar 28, 2024 •

edited

dbdimitrov commented Mar 29, 2024 •

edited

dbdimitrov commented Mar 29, 2024 •

edited

Marwansha commented Mar 29, 2024 •

edited

dbdimitrov commented Apr 4, 2024 •

edited

Marwansha commented Apr 4, 2024 •

edited