Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BASiCS_DenoisedCounts requires no further data normalisation? #264

Open
binyaminZ opened this issue Sep 5, 2022 · 7 comments
Open

BASiCS_DenoisedCounts requires no further data normalisation? #264

binyaminZ opened this issue Sep 5, 2022 · 7 comments

Comments

@binyaminZ
Copy link

Hello!

I tried implementing BASiCS_DenoisedCounts as described in the vignette. I used the chain and the original, raw data (CellRanger output with BASiCS_Filter applied to it) as input. However, when comparing my two samples, I see a transcriptome-wide drop in mean values of the denoised counts. Here are some plots, comparing the means of all genes between the two samples:

Difference in mean values (per gene) before denoising (normalized to sequencing depth):
image

Difference in mu values after testDE (MeanLog2FC, this seems to be OK):
image

Difference in mean values after denoising using SCTransform:
image

Difference in mean values after denoising using BASiCS_DenoisedCounts (seems to be unnormalized):
image

Did I do something wrong? Or possibly a bug in BASiCS_DenoisedCounts?
Thanks!

@alanocallaghan
Copy link
Collaborator

Just to ask some clarifying questions first:

First, when you say you are comparing two samples, are you

  1. comparing the pre-normalised counts to normalised, or
  2. comparing two samples post-normalisation with different methods?

Secondly, how are you quantifying mean values?

Finally, how are you quantifying the log2FC values?

@binyaminZ
Copy link
Author

I have done single-cell with two cell populations: one DMSO treated and one IdU treated. both samples were analyzed separately with CellRanger, and count matrices were used as input for BASiCS for noise quantification. I'm using different normalization methods to compare mean expression (and noise) between DMSO and IdU. Means are computed based on normalized counts across all cells (rowMeans). log2FC is just dividing the means of IdU by the means of DMSO. In more detail:

  1. first plot above is raw CellRanger output, divided by total reads per cell and log-transformed. this corrected matrix was used to compute means (rowMeans) and then log2FC between means is shown.
  2. the second plot is just the MeanLog2FC column in testDE output, comparing the DMSO and IdU chains.
  3. in the third plot I used the corrected matrix from SCTransform ([["SCT"]]@counts), means are rowMeans of that matrix and log2FC is comparing these means.
  4. same as 3, but with BASiCS_DenoisedCounts

@binyaminZ
Copy link
Author

binyaminZ commented Nov 14, 2022

Hi,
Following up on this issue: We have performed a few more checks and compared BASiCS with several other normalization pipelines. It is odd that BASiCS is the only algorithm producing this global shift of mean values.
I have generated reproducible data and code, available here. Obviously, the problem is not in BASiCS_DenoisedCounts, as this shift in mean values is also evident when checking the mu values from the chains (first plot). Interestingly, when looking at BASiCS_TestDE results (testDE@Results$Mean@Table), this effect disappears (second plot), however, the denoised counts still exhibit a significant global shift of the means (last plot).
Do you have any explanation for this?
Thanks!

image
image

@alanocallaghan
Copy link
Collaborator

Thanks for the reproducible example and the code, and sorry for the delay. Will have a look now. I think I understand what's going on but wanted to block some time to be able to write a proper response

@alanocallaghan
Copy link
Collaborator

The short story here is that denoised counts is removing the technical
difference between cells within a population, but it does not normalise cells to
a "gold standard" overall expression level. Therefore it aims to remove the
effect of cell size/total mRNA content within a population, but the
cell size/total mRNA content may differ very widely between two different
populations, and in this case there is an overall shift in mRNA levels between populations. As you noted BASiCS_TestDE accounts for this kind of overall shift, but at the moment DenoisedCounts doesn't. I'll see if I can make a workaround.

@binyaminZ
Copy link
Author

binyaminZ commented Nov 21, 2022

This is a very strange behavior... I would add a line to the code to normalize the DenoisedCounts matrix, or remove the bold note on the vignette saying that "the output of BASiCS_DenoisedCounts requires no further data normalisation"
Thanks for clarifying this!

@alanocallaghan
Copy link
Collaborator

I think it arises from an oversight in the no spikes case that will be fixed in #267

Thanks for raising the issue and documenting it thoroughly, sorry it took so long to get to

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants