Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ComBat-seq + DESeq2 / WGCNA Or DESeq2 (batch as covariate) / ComBat + WGCNA #18

Open
sharvarinarendra opened this issue Jan 31, 2021 · 9 comments

Comments

@sharvarinarendra
Copy link

Hi,

I am using ComBat-seq to remove batch effects from my dataset, and then running DESeq2 on the same. I was wondering if I could use the same data, after rlog transformation, for WGCNA?

Which pipeline would be better (to get both differentially expressed genes and WGCNA results) -

  1. ComBat-seq -> DESeq2 -> rlog -> WGCNA
  2. DESeq2 (batch as covariate) -> rlog -> ComBat -> WGCNA

Thank you!

@Bithorax
Copy link

Bithorax commented Feb 3, 2021

Hi, I'm currently doing something similar to you. To answer your question I would say that batch correction should be the first step as it requires raw data as input. So my suggestion is to follow the workflow 1.

I do have a question as well. By running "ComBat_seq(Dataset,batch=my_batch)", is the output going to be the dataset corrected by batch effects?

@zhangyuqing
Copy link
Owner

@Bithorax thanks for your suggestion for the question! Yes, the output will be the dataset corrected by batch effects.

@sharvarinarendra
Copy link
Author

Thank you for your answer, @Bithorax and @zhangyuqing !

@Bithorax
Copy link

Bithorax commented Feb 5, 2021

One last question if you can help. I'm not quite sure when I should specify the "group" and hence "full mod=TRUE" parameters. do you have an explanation?

@zhangyuqing
Copy link
Owner

@Bithorax Both "group" and "covar_mod" refer to any covariates whose signal you would like to keep in your data. So, in differential expression analysis for example, group would be the condition group you are comparing. In addition, if you would like to remain information from any other variables, you can specify them in covar_mod. On the contrary, "batch" is the variable whose signal you would like to remove from the data.

@Bithorax
Copy link

Bithorax commented Feb 5, 2021

Thanks for the explanation. Just a doubt. If specifying "batch" is only removing the batch effect from the dataset, then automatically the signal of my variables of interest are kept. Am I wrong?

@zhangyuqing
Copy link
Owner

@Bithorax Unfortunately in real data, we can never be 100% sure that only batch effect is removed, because we do not truly know how batch has affected the data, we can only guess. And we are guessing these effects using linear models. In linear models, whether or not you include other signals in the model affects your guess on the batch effect.

If you are familiar with linear regression, perhaps you can think of it simply as the difference between estimating parameters of the 2 models below:
data ~ batch
data ~ batch + other signals
The parameters for batch are what we are guessing, which has different interpretations and values in the two models.

@Bithorax
Copy link

Bithorax commented Feb 5, 2021

Yes, I see your point and I agree. It would be curious to compare the two models to see the difference in the signal. But I guess this also depends on the input dataset.

Thanks for the feedback!

@ahdee
Copy link

ahdee commented Aug 13, 2021

@zhangyuqing
I'm a bit confused about this since it looks like option 1 is recommended? My understanding is that the linear model should be run with uncorrected data with batch as a covariate. The statiscal results can then be merged back with the combat corrected and normalized counts. Can someone please confirm. May be I'm mis undstanding the question somewhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants