You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to recreate this table and I think there is an error when intersecting the input matrix with the reference database.
Here is the section in question (lines 87-109):
# Read in matrix and Sample annotation
mat <- read.delim("data/CountData.BMS038.txt")
SampleTableCorrected <- read.csv("data/SampleTableCorrected.9.19.16.csv", row.names=1)
# Only samples with response
SampleTableCorrected <- SampleTableCorrected[!(is.na(SampleTableCorrected$Response)), ]
# Find the overlapping samples
inter <- intersect(colnames(mat),rownames(SampleTableCorrected))
SampleTableCorrected <- SampleTableCorrected[inter,]
mat <- mat[,match(rownames(SampleTableCorrected),colnames(mat))]
# Create the DESeq2 object
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
ebg <- exonsBy(txdb, by="gene")
intersection <- intersect(rownames(mat),ebg@partitioning@NAMES)
ebg2 <- ebg[ebg@partitioning@NAMES %in% intersection]
# sort by ID
ebg2 <- ebg2[order(names(ebg2))]
# Sort by gene model order
mat <- mat[match(names(ebg2), rownames(mat)),]
mat is initially read in as a data.frame with 22,333 rows (genes) and 120 columns (samples).
mat gets intersected with the samples in SampleTableCorrected to remove 15 samples, resulting in 22,333 rows and 103 columns (both the "ID" column and the "HUGO" column are also removed here).
ebg is created from txdb, which contains 23,459 elements.
The entrez IDs of ebg (ebg@partitioning@NAMES) are intersected with the rownames of mat, but the rownames of mat are just 1:nrow(mat):
This intersection results in a mat with 7,866 rows and 103 columns. If the rownames of mat are instead changed to the original entrez IDs, we get a larger intersection:
# Read in matrix and Sample annotation
mat <- read.delim("data/CountData.BMS038.txt")
ids <- mat$ID
SampleTableCorrected <- read.csv("data/SampleTableCorrected.9.19.16.csv", row.names=1)
# Only samples with response
SampleTableCorrected <- SampleTableCorrected[!(is.na(SampleTableCorrected$Response)), ]
# Find the overlapping samples
inter <- intersect(colnames(mat),rownames(SampleTableCorrected))
SampleTableCorrected <- SampleTableCorrected[inter,]
mat <- mat[,match(rownames(SampleTableCorrected),colnames(mat))]
rownames(mat) <- ids
# Create the DESeq2 object
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
ebg <- exonsBy(txdb, by="gene")
intersection <- intersect(rownames(mat),ebg@partitioning@NAMES)
ebg2 <- ebg[ebg@partitioning@NAMES %in% intersection]
# sort by ID
ebg2 <- ebg2[order(names(ebg2))]
# Sort by gene model order
mat <- mat[match(names(ebg2), rownames(mat)),]
This results in a matrix with 22,333 genes and 103 samples. Although neither method above created the same results as were found in TableS6.A downloaded from the manuscript page.
In order to get an output table, I had to change: Response.ihw.PRCR <- ResFiltered(results(dds.pre,name="ResponsePRCR",filterFun = ihw)) to Response.ihw.PRCR <- ResFiltered(results(dds.pre,name="Response_PRCR_vs_PD",filterFun = ihw)), which may be the result of a different version of DESeq2 (I'm using 1.20.0).
The text was updated successfully, but these errors were encountered:
I'm trying to recreate this table and I think there is an error when intersecting the input matrix with the reference database.
Here is the section in question (lines 87-109):
mat
is initially read in as a data.frame with 22,333 rows (genes) and 120 columns (samples).mat
gets intersected with the samples inSampleTableCorrected
to remove 15 samples, resulting in 22,333 rows and 103 columns (both the "ID" column and the "HUGO" column are also removed here).ebg
is created fromtxdb
, which contains 23,459 elements.The entrez IDs of
ebg
(ebg@partitioning@NAMES
) are intersected with the rownames ofmat
, but the rownames ofmat
are just1:nrow(mat)
:This intersection results in a
mat
with 7,866 rows and 103 columns. If the rownames ofmat
are instead changed to the original entrez IDs, we get a larger intersection:This results in a matrix with 22,333 genes and 103 samples. Although neither method above created the same results as were found in TableS6.A downloaded from the manuscript page.
In order to get an output table, I had to change:
Response.ihw.PRCR <- ResFiltered(results(dds.pre,name="ResponsePRCR",filterFun = ihw))
toResponse.ihw.PRCR <- ResFiltered(results(dds.pre,name="Response_PRCR_vs_PD",filterFun = ihw))
, which may be the result of a different version of DESeq2 (I'm using 1.20.0).The text was updated successfully, but these errors were encountered: