Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include corrected PCA inside rnadiff module #845

Open
cokelaer opened this issue Feb 20, 2024 · 1 comment
Open

Include corrected PCA inside rnadiff module #845

cokelaer opened this issue Feb 20, 2024 · 1 comment
Assignees

Comments

@cokelaer
Copy link
Collaborator

No description provided.

@rania-o
Copy link
Collaborator

rania-o commented Feb 20, 2024

This is the code I used to remove the batch effect :
First I installed the package limma using mamba. For DESeq2 I load the module sequana.

library(limma)
library(DESeq2)

# load the raw counts
raw_counts <- read.csv("path_to_raw_counts", sep=",",header=T)
design <- read.csv("design.csv", sep=",", header=T)

# transform the counts to a matrix
cts <- as.matrix(raw_counts)

# transformation into factors
coldata <- DataFrame(label=factor(design$label), condition=factor(design$condition), batch=factor(design$batch))

# check that they. have the same order
all(rownames(coldata) == colnames(cts))

# load data with deseq2
dds <- DESeqDataSetFromMatrix(countData = cts,colData = coldata,design = ~ batch + condition)

# create the dds object
dds <- DESeq(dds)

# vst transformation for PCA
vsd <- vst(dds)

# remove the batch effect with limma
assay(vsd) <- limma::removeBatchEffect(assay(vsd), vsd$batch)

# plot the PCA
pdf("pca_batch.pdf")
plotPCA(vsd, intgroup = c("condition"))
dev.off()

In the case of two batch effects :

# the first lines of code are the same

# transformation of the design file into factors
coldata <- DataFrame(label=factor(design$label), condition=factor(design$condition), group=factor(group$batch))

# here we load the dataset with a simple design
dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design = ~ condition)

# we add the column ind.n in colData for individuals
coldata$ind.n <- factor(rep(rep(1:4,each=1),4))

# we modify the data already loaded with DESeq2 and add the col ind.n
colData(dds) = coldata

#the modified design will result in this matrix
model.matrix(~ group + group:ind.n + group:condition, coldata)

# PCA part

vsd <- vst(dds)
# we remove the two  batch effects
assay(vsd) <- limma::removeBatchEffect(assay(vsd), vsd$group, vsd$ind.n)
pdf("pca_test2.pdf")
plotPCA(vsd, intgroup = c("cnd"))
dev.off()

I didn't manage to apply the two batch effects on the differential analysis.
You can see my post on the bioconductor forum :
https://support.bioconductor.org/p/9156646/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants